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We introduce the concept of directed loops in stochastic series expansion and path integral quan- 
tum Monte Carlo methods. Using the detailed balance rules for directed loops, we show that it is 
possible to smoothly connect generally applicable simulation schemes (in which it is necessary to 
include back-tracking processes in the loop construction) to more restricted loop algorithms that 
can be constructed only for a limited range of Hamiltonians (where back-tracking can be avoided). 
The "algorithmic discontinuities" between general and special points (or regions) in parameter space 
can hence be eliminated. As a specific example, we consider the anisotropic S = 1/2 Heisenberg 
antiferromagnet in an external magnetic field. We show that directed loop simulations are very 
efficient for the full range of magnetic fields (zero to the saturation point) and anisotropies. In 
particular for weak fields and anisotropies, the autocorrelations are significantly reduced relative to 
those of previous approaches. The back-tracking probability vanishes continuously as the isotropic 
Heisenberg point is approached. For the XY-model, we show that back-tracking can be avoided 
for all fields extending up to the saturation field. The method is hence particularly efficient in this 
case. We use directed loop simulations to study the magnetization process in the 2D Heisenberg 
model at very low temperatures. For L x L lattices with L up to 64, we utilize the step-structure 
in the magnetization curve to extract gaps between different spin sectors. Finite-size scaling of 
the gaps gives an accurate estimate of the transverse susceptibility in the thermodynamic limit: 
X± = 0.0659 ± 0.0002. 

PACS numbers: 05.10.-a, 05.30.-d, 75.10.Jm, 75.40.Mg 



I. INTRODUCTION 

In recent years, significant advances in quantum Monte 
Carlo (QMC) algorithms have opened up several classes 
of quantum many-body models to the kind of large- 
scale numerical studies that were previously possible only 
for classical systems. The progress has been along two 
main lines: (i) the elimination ||, ||, [|, || of the sys- 
tematic error of the Trotter decomposition on which 
most of the early finite-temperature QMC algorithms 
j^, ||, ||, |l0|, |ll[ were based (with the exception of Hand- 
scomb's method jl2], |l3[ Q the utility of which was 
limited), and (ii) the development of loop-cluster algo- 
rithms |l6| ] for efficient sampling in th e q uantum me- 
chanical configuration space (|, [| |l7], [ll| |l9). Algo- 
rithms incorporating both (i) and (ii) have been devised 
starting from either the Euclidean path integral (world- 
line QMC methods operating in continuous imaginary 
time H f§) or the power series expansion of the partition 
function (stochastic series expansion, hereafter SSE [ pj| , 
which is an extension of Handscomb's method). While 
the Trotter error is a controllable one and can be elimi- 
nated in standard approaches by extrapolating results for 
different imaginary time discretizations to the continuum 
H , its absence directly at the level of the simulation 
can imply considerable time savings when unbiased re- 
sults are needed, e.g., in finite-size scaling studies. The 



loop-cluster algorithms (world-line loops (l6| Jl7|, |l9|, SSE 
operator- loops ]l8| |, and world- line worms have of- 
fered even more dramatic speed-ups, in many cases re- 
ducing autocorrelation times by several orders of mag- 
nitude and thus enabling studies of system sizes much 
larger than what was possible with local sampling al- 
gorithms. In addition, in some special cases, fermionic 
and other sign problems can be eliminated with the loop- 
cluster algorithms El], E2, 23 . 



The new QMC methods have become important tools 
for quantum many-body research in condensed matter 
physics (with applications to quantum spin s p4|, p5| , |26| , 
|7|, H |2§ |3C], |3| !| ]3§ !§, bosons JP, |3§ |37|, and 
one-dimensional fermion systems |j38[ |39|| ) as well as in 
lattice gauge theory B2|. An important property of 
some of the loop-cluster algorithms is that they are effi- 
cient also in the presence of external fields jl8[ [l9|, |2^, (lO) . 
In particular the SSE algorithm with the operator-loop 
update 1 18 has proven very powerful in several recent 
studies of quantum spin systems ]3^, |34|] and boson sys- 
tems 1 35, 39, H?]] including, respectively, a magnetic field 
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and a chemical potential. It is interesting to note that in 
this respect QMC algorithms now perform better than 
classical Monte Carlo, since in the latter case external 
fields still pose challenging problems. 

In this paper we present a general framework for con- 
structing loop-type algorithms both in SSE and path in- 
tegral methods. We focus primarily on the SSE approach, 
which owing to the manifestly discrete nature of its con- 
figuration space is easier to implement and, for the same 
reason, also is more efficient in most cases. In the SSE 
operator-loop update introduced in Ref. [L8L a distinction 
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was made between a general algorithm (where it is neces- 
sary to allow the propagating end of the operator path to 
back-track) and special ones applicable only for certain 
Hamiltonians (where the paths do not back-track). For 
example, in the case of the S — 1/2 Heisenberg model 
with uniaxial anisotropy A and external magnetic field h 
(also known as the XXZ-model), 

particularly efficient algorithms were devised at the 
isotropic Heisenberg point (A = 1, h = Q) and for the 
XY-model (A = 0, h = 0). While the general algorithm 
can be used for any A, h, it does not perform as well 
in the limits A — * 1, h — ► and A — * 0, h — > as the 
special algorithms exactly at these points (which are the 
only points at which the more efficient algorithms can be 
used). Hence, one has to switch algorithms when crossing 
the isotropic Heisenberg and XY points. The presence of 
such "algorithmic discontinuities" is clearly bothersome, 
both from a mathematical and practical point of view. 
Here we show how the algorithmic discontinuities can be 
eliminated within a more general framework of satisfying 
detailed balance when constructing the operator-loop. 
For reasons that will become clear below, we call the 
entities involved in this type of update "directed loops" . 
With these, we are able to carry out simulations as effi- 
ciently in the limits approaching the Heisenberg and XY 
points as exactly at those points. We also show that this 
scheme can be easily adapted to continuous-time path 
integrals. 

The outline of the rest of the paper is the following: 
In Sec. II we review the SSE method and the operator- 
loop update on which the new directed loop algorithm is 
based. We outline a proof of detailed balance and also 
discuss a few special cases in which back-tracking can be 
easily avoided in the loop construction. In Sec. Ill wc 
first discuss a more general condition for satisfying de- 
tailed balance in the SSE method, which leads us to the 
directed loop equations. We then show in detail how this 
scheme is applied to the spin- 1/2 XXZ model. We present 
two solutions of the directed loop equations. One is iden- 
tical to the previous generic operator-loop update and 
the other smoothly connects to the special "determinis- 
tic" operator-loop algorithm at the isotropic Heisenberg 
point. We also briefly discuss the structure of the di- 
rected loop equations for a more general class of Hamil- 
tonians. Implementation of directed loops in the path 
integral formalism is discussed in Sec. IV. In Sec. V we 
present simulation results in various parameter regions 
of the XXZ-model. We compare autocorrelation times 
for simulations using the two different directed loop so- 
lutions. We also extract the dynamic exponent in simula- 
tions of isotropic Heisenberg models at critical points in 
one, two, and three dimensions. In Sec. VI, as a demon- 
stration of what can be accomplished with the improved 
solution, we present results for the magnetization as a 
function of the external field in the 2D Heisenberg model 



at very low temperatures. We calculate the magnetic 
susceptibility using gaps between different spin sectors 
extracted from the step-structure in the magnetization 
curve. We conclude with a summary and discussion in 
Sec. VII. In an Appendix wc outline the basic elements 
of a simple and efficient computer implementation of the 
SSE method. 



II. STOCHASTIC SERIES EXPANSION 

The SSE method is a generalization |l], g, [l8| of Hand- 
scomb's power series expansion method for the isotropic 
S = 1/2 Heisenberg ferromagnet and antiferromag- 
net p3| , [l4| to a much wider range of systems. The perfor- 
mance is significantly improved also for the Heisenberg 
model |27| , p8| , plij ]. Early attempts of such generaliza- 
tions [|15| were limited by the difficulties in analytically 
evaluating the traces of the terms of the expansion. This 
problem was solved |^| by the development of a scheme 
for importance sampling also of the individual terms of 
the traces expressed in a conveniently chosen basis. The 
starting point of the SSE method is hence the power se- 
ries expansion of the partition function: 



Trjt 



a) 



(2) 



a n—0 



where the trace has been written as a sum over diago- 
nal matrix elements in a basis Simulation algo- 
rithms based on this expansion can be formulated with- 
out sign problems for the same models as those for which 
world-line methods Q are applicable. There are no ap- 
proximations causing systematic errors and very efficient 
loop-type u pda ting algorithms have also recently been 
devised (l^, gS, 42, 3. A distinct advantage of SSE over 
continuous- time world-line methods J3|, ||] is the discrete 
nature of the configuration space, which can be sampled 
without floating point operations. 

Here we first review an implementation of the SSE 
method for the anisotropic S = 1/2 Heisenberg model. A 
proof of detailed balance in the operator-loop updating 
scheme is then outlined. Several practical issues related 
to the operator-loops are also discussed. Estimators for 
physical observables will not be discussed here. Several 
classes of expectation values have been derived in Rcf. ||. 
Observables of interest in the context of the Heisenberg 
model have been discussed in Ref. |4l]. Off-diagonal corre- 
lation functions (single-particle Green's functions) have 
been studied in Ref. E3. 



A. SSE configuration space 

For the anisotropic Heisenberg antiferromagnet (|l|) 
with N spins it is convenient to use the standard basis 



\a) = \Sf,SI,...,S : 



JV/i 



(3) 
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and to write the Hamiltonian in terms of bond operators 
H b , where 6 refers to a pair of sites i(b),j(b), 



H=-jJ2H bl (J>0). 



(4) 



b=i 



For a d-dimensional cubic lattice the number of bonds 
N b = dN . The bond operators are further decomposed 
into two operators; 

Hb = Hi,b — H2A, (5) 

where H\ b is diagonal and H 2 b off-diagonal; 

H i.b = C - AS-( b) S* {b} + h b [S- {b} + S z ](b) ], (6) 

H 2 . b = ^[St(b) S J(b) + S t(b) S j~(b)}> ( 7 ) 

and we have defined the magnetic field on a bond; hb = 
h/(2dJ). The constant C should be chosen such that all 
matrix elements of Hi t, are positive, i.e., C > A/4 + h b . 
We will henceforth use the notation 



C = C + e, C = A/4 + h b , 



(8) 



where e > 0. In the Hamiltonian ^ we have neglected 
a constant N b C, which should be kept in mind when 
calculating the energy. 

The powers of H in Eq. (||) can be expressed as sums 
of products of the bond operators (||) and ([7]). Such a 
product is conveniently referred to by an operator-index 
sequence 



S n = [at,b±], [a 2 ,b 2 ], [a n ,b n ], 



(9) 



where a; S {1,2} corresponds to the type of operator 
(l=diagonal, 2=off-diagonal) and b{ € {1, . . . ,N b } is the 
bond index. Hence, 



a n=0 S„ 



a ) , (10) 



where f3 = J/T and n 2 is the total number of spin- 
flipping operators [2,6] in S n . It is useful to define nor- 
malized states resulting when \a) is propagated by a frac- 
tion of the SSE operator string: 



Wp))~t[H aitbi \ 



(11) 



Note that there is no branching, i.e., all \a(p)) are ba- 
sis states, and \a(p)) and \a(p + 1)) are either the same 
state or differ only by a flipped pair of spins. In order 
for a term (a, S n ) to contribute to the partition function 
the boundary condition \a(n)) — \a(0)} has to be satis- 
fied. On a bipartite lattice n 2 must therefore be even, 
and the expansion is then positive definite. The terms 
(configurations) can thus be sampled using Monte Carlo 
techniques without sign problems. 



To simplify the Monte Carlo sampling, it is useful Q| 
(although not necessary (|]) to truncate the expansion at 
a maximum power n = M and to insert M — n "fill-in" 
unit operators i?o.o = 1 hi the operator products in all 
possible ways. This gives 



Z = 



EE 



n (M-n)\ 
Ml 



A I 



! = 1 



(12) 



where n now is the number of operators [a^,^] ^ [0,0]. 
One can show that 0, [lj] the average expansion order 



(n)=pN b \E b \ 



(13) 



where E b is the internal energy per bond, E b = —(H b ) 
[including the constant C in rtq)], and that the width of 
the distribution is approximately (n) 1 / 2 . M can there- 
fore be chosen so that n never reaches the cut-off during 
the simulation (M ~ f3N). The truncation error is then 
completely negligible. In practice, M is gradually ad- 
justed during the equilibration of the simulation, so that 
M = a x n max , where n max is the highest n reached. A 
practical range for the factor a is 1.2 — 1.5. The sim- 
ulation can be started with some random state \a) and 
an "empty" operator string [0, 0]i, . . . , [0,0] m (we some 
times use the notation [a, b] p instead of [a p , b p ]). Ergodic 
sampling of the configurations (a, S n ) is accomplished 
using two different types of updates. 



B. Updating scheme 



The first update (diagonal update) is of the type 
[0,0] p <-> [l,6] p , involving a single diagonal operator 
which changes the expansion order n by ±1 pl| . The 
corresponding Metropolis acceptance probabilities are 

F([0 ,o]^[i,6U = n ^(p)\ h Mp)) i (14) 



p([i,b] p ^ [o,oy = 



M-n 
M - n + 1 



N b 0{a(p)\H ltb \a(p)) 



. (15) 



where P > 1 should be interpreted as probability one. 
The presence of N b in these probabilities reflects the fact 
that there are N b random choices for the bond b in a 
substitution [0,0] — > [1,6] but only one way to replace 
[1, 6] — > [0, 0] when 6 is given. These diagonal updates are 
attempted consecutively for all p = 1, ... , M, and at the 
same time the state |a) is propagated when spin flipping 
operators [2, 6] are encountered (these cannot be changed 
in a single-operator update), so that the states \a(p)) are 
available when needed to calculate the probabilities ( |l4| ) 
and (§j). 

The purpose of the second type of update — the 
operator-loop (l8) - is to accomplish substitutions 
[1, b] p <-> [2,6] p for a varying number of operators, 
thereby flipping spins also in several of the propagated 
states (fll). The expansion order n does not change. It 
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FIG. 1: The six different vertices corresponding to the matrix 
eiements in Eqs. (|l8|). The horizontai bar represents the fuil 
bond operator H b and the circies beneath(above) represent 
the spin state (soiid and open circies for spin-| and spin-J., 
respectiveiy) before(after) operation with either the diagonai 
or off-diagonal part of Ht. 



is then convenient to disregard the [0, 0] unit operator 
elements in Sm and instead work with the original se- 
quences S n of Eq. (fiol), which contain only elements [1, b] 
and [2,6]. For the discussion of the operator-loops, the 
propagation index p will refer to this reduced sequence. 
It is also convenient to introduce two-spin states 



K(P)> = |5?(6 P) (P),^ (6P) (P)) ) 



(16) 



i.e., the spins at bond b p in the propagated state \a(p)} 
as defined in (fy]). The weight factor corresponding to 
( |io| ) can then be written as 

on n 

W{a, S n ) = ^l[ (a bp (p)\H bp \a bp (p - 1)) , (17) 
where the non-zero two-spin matrix elements are 



(II \H b \ ||) 

(IT \H b \ IT) 
(Tl \H b \ IT) 
(TT \H b \ TT) 



(Tl \H b \ Tl) 
(IT \H b \ Tl) 

e + 2h b . 



A/2 
1/2, 



(18) 



In principle the value of e > is arbitrary but in practice 
a large constant is inconvenient since the average expan- 
sion order ( p^ ) has a contribution e(3N b . In many cases 
the simulation performs better with a small e > than 
with e = 0, however, as will be demonstrated in Sec. V. 
For e — 0, the number of allowed matrix elements is re- 
duced from 6 to 4 (if h = 0) or 5 (if h > 0). 



The matrix element product in the weight (17) can 
be represented as a network of n vertices, with two 
spins Sf{p — 1), Sj(p — 1) "entering" the p:th vertex and 
Sf(p),Sj(p) "exiting". The six allowed vertices, corre- 
sponding to the non-zero matrix elements (|l8[), are il- 
lustrated in Fig. [l|. The direction of propagation (here 
and in other illustrations) is such that moving upward 
corresponds to increasing the propagation index p. 

In order to carry out the operator-loop update, a linked 
list of the vertices is first constructed. For each of the 
four legs on each vertex there is a spin state and a link 
to the following (in the direction of increasing p) or pre- 
vious (direction of decreasing p) vertex leg at the same 
site. The periodic boundary condition of the propagated 
states must be taken into account, i.e., the links can span 
across p = and every leg then has an outgoing and in- 
coming link (i.e., a bidirectional link). In case a spin 
(site) is acted upon only by a single operator in S n , the 
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FIG. 2: (a) An SSE configuration for a three-site system with 
three operators, shown along with all the propagated states. 
Here open and solid bars indicate diagonal and off-diagonal 
operators, respectively, (b) The linked vertex list correspond- 
ing to (a). The dashed lines represent bidirectional links. 
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FIG. 3: All four paths through two vertices where the en- 
trance is at the low-left leg. The arrow indicates the exit 
leg. The resulting updated vertices, with the spin at the en- 
trance and exit legs flipped, are also shown. The two cases 
marked with an X are forbidden, since the updated vertices 
do not correspond to operators in the Hamiltonian consid- 
ered here. We refer to the four different processes as (a) 
"bounce", (b) "continue-straight", (c) "switch-and-reverse" , 
and (d) "switch-and-continue" . 



corresponding two legs of that vertex are linked together. 
Otherwise, for a site acted upon by two or more opera- 
tors, all links are between different vertices. An exam- 
ple of an SSE configuration and its corresponding linked 
vertex list is shown in Fig. ^. Clearly, in an allowed con- 
figuration links can exist only between legs in the same 
spin state. Note that in the representation with the full 
states in (a), which is never used in the actual simulation 
but is included here for illustrative purposes, we distin- 
guish between diagonal and off-diagonal operators (as is 
also done in the stored operator sequence Sm used in 
the diagonal update). In the vertex representation (b) 
the two-spin states are taken from the full propagated 
states ( |l6| ) and the type of the operator (diagonal or off- 
diagonal) is implicitly given by the four spin states. The 
bar is hence strictly redundant, but we include it in the 
figures as a reminder that the vertices represent matrix 
elements of the bond-operators. 

To construct an operator-loop, one of the 4n vertex 
legs is first selected at random as an initial entrance leg. 
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FIG. 4: Two different ways in which an operator-ioop can 
ciose. The starting points of the loops in (a) and (b) are the 
legs from which the arrows point out. In (a) the last segment 
of the loop connects the initial and final vertices, resulting in 
the starting spin being flipped in the final configuration. In 
(b) the last loop segment is within the initial vertex and the 
starting spin is flipped twice, with the net effect of no change. 
Both loops (a) and (b) here result in the updated vertices 
shown in (c). 

One of the four legs belonging to the same vertex as the 
entrance leg is then chosen as the exit from the vertex, 
and both the entrance and exit spins are flipped. Exam- 
ples of how vertices change in the four types of process are 
shown in Fig. |^. The probability of exiting at a given leg, 
given the entrance leg and the four spin states defining 
the vertex, is taken proportional to that matrix clement 
in ([l8|) which corresponds to the vertex generated when 
the entrance and exit spins have been flipped. As an ex- 
ample, defining matrix elements obtained when flipping 
spins in a vertex as 

W(*$)(p)= (19) 

(/aS?(p), USfo)\H b \fiS?(p - 1), f 2 S]( P - 1)), 

where = —1 if the spin on leg i (i = 1, 2, 3, 4) is flipped 
and fi = +1 if it is not flipped, the probability of exiting 
at leg 2 if the entrance is at leg 1 is given by 

P2A ~ w (++) + w (±t) + w (:+) + w (t;) ' (20) 

where we have used ± for ±1. The reasons for this choice 
for the probability will be discussed in Sec. II C. If the en- 
trance and exit correspond to different sites (the switch- 
and-reverse and switch-and-continue processes in Fig. ||), 
the change in the vertex corresponds to a change of the 
type of the operator (diagonal <-* off-diagonal). The leg 
to which the exit is linked is taken as the entrance to the 
next vertex, from which an exit is again chosen. This 
procedure is repeated until the original starting point is 
reached (the loop closes). The mismatches (links con- 
necting different spin states) existing at the original en- 
trance and at the propagating end of the path are then 
"healed" and a new configuration contributing to the par- 
tition function has been created. Note that, depending 
on the way the loop closes, the spin at the leg from which 
the loop construction was started may or may not be 
flipped after the full loop has been completed. Examples 
illustrating this are given in Fig. ^. 

One of the two site-switching paths — the switch-and- 
reverse in Fig. 0(c) or switch-and-continue in H(d) - 



is always forbidden since the corresponding off-diagonal 
matrix element of the Heisenberg bond-operator is zero. 
The bounce path in Fig. ||(a) is always allowed since the 
vertex is unaffected (the same spin is flipped twice, re- 
sulting in no net change). The continue-straight path 
||(b) is always allowed if the constant e > so that all 
the diagonal matrix elements in ( |l8| ) are larger than zero. 
If e = at least one of the diagonal matrix elements van- 
ishes, and the continue-straight process is then forbidden 
in some cases. 

If a spin in the state \a) is not acted upon by any of 
the operators in Sm , it cannot be flipped by the operator- 
loop update. Such "free" spins can, however, be flipped 
with probability 1/2 since they do not appear in the 
weight function. Since the average of n, the number of 
operators in Sm, grows linearly with j3, free spins appear 
frequently only at relatively high temperatures. 

It is convenient to define a Monte Carlo step (MCS) 
as a sweep of diagonal updates at all positions in Sm 
where possible, followed by construction of the linked list 
in which a number Ni of operator-loops are constructed 
before mapping back to a new Sm and \a) and flipping 
free spins. Observables can be measured after every, or 
every few, MCS (in some cases, it may even worth-while 
to record measurements after every loop). 

The remaining question now is how many operator- 
loops one should construct in each MCS. The operator- 
loops are typically of highly varying lengths. Each MCS 
should involve several loop updates, so that a significant 
fraction of the vertices are visited. In order not to bias 
the measurements it is important that Ni is fixed. One 
cannot, e.g., keep on constructing loops during a given 
MCS until the number of vertices visited exceeds a pre- 
determined number. The average size of the operator- 
loops depends strongly on the model parameters. It 
is therefore useful to record the loop sizes and periodi- 
cally adjust Ni during the equilibration of the simulation. 
Typically, we determine 2Vj such that the average cumu- 
lative loop length (the number of vertices visited) during 
one MCS is approximately 2(n) or 2M. In recording the 
loop length, we do not count bounces (since no change 
result in the vertex off which the path bounces). Among 
the counted steps there are still some fraction of back- 
tracking ones, i.e., segments of the operator- loop where 
completed vertex updates are reversed. If a bounce oc- 
curs already at the first step the loop closes immediately. 
With our definition, this is a completed loop of length 
0. In order not to bias the measurements, such length-0 
loops also have to be counted among the Ni completed 
loops. 

One could also fix 2Vj based on a criterion involving the 
average number of leg-spins which are actually flipped 
during an MCS, but recording this number is slightly 
more complicated than just keeping track of the loop 
lengths. Since this has to be done only during equili- 
bration the cost is not prohibitive, however. The exact 
definition of Ni and precisely what constitutes one MCS 
are not critical issues (as in the classical Wolff cluster 
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algorithm [|45| , where the MCS can also be defined in a 
way analogous to what we have discussed here). 

The operator-loop construction (the operator path) is 
a type of random walk in a d+l-dimensional space (al- 
though the network of connected vertices does not neces- 
sarily have this dimensionality — it could effectively have 
a fractal dimension < d + 1). One may therefore won- 
der whether the closing of the loop could become prob- 
lematic, especially for large systems in three dimensions. 
In some cases, an operator-loop can indeed become very 
long before it closes. In rare cases a loop may even not 
close during a simulation of practical length. The loop 
size distribution is always very broad, however, and the 
non-closing problem can simply be circumvented by im- 
posing a maximum length beyond which the loop con- 
struction is terminated. The way we typically implement 
this termination is by immediately initiating a new MCS 
(beginning with a diagonal update), hence disregarding 
all the loops that were constructed during the MCS of the 
terminated loop. This way, we do not have to save actual 
operator paths (needed in order to undo the changes done 
during construction of the terminated loop), which would 
become impractical for long paths. The termination does 
not violate detailed balance and hence the correct distri- 
bution of configurations contributing to Z is maintained. 
Termination of incomplete loops does introduce a bias 
in quantities which are related to the extended config- 
uration space, however, such as single-particle Green's 
functions ffifl . Typically, we use a maximum loop length 
w 100 x (n). For the XXZ-model (in any number of di- 
mension) incomplete loop termination is then extremely 
rare (excessively long loops can occur more frequently 
in other models f|§]). The average loop length is typi- 
cally much smaller than (n), but can in some cases be a 
significant fraction of (n) (up to tens of percent). 



C. Detailed balance 

In the originally proposed operator- loop scheme |l8[ , 
the probability of selecting an exit leg is proportional to 
the corresponding matrix element ( p"8| ) when the entrance 
and exit spins have been flipped (with the factor of pro- 
portionality chosen to give probability one for the sum 
of the four probabilities), a specific example of which is 
written in Eq. (^0|). One can prove that detailed balance 
is satisfied in this process by considering an extended 
configuration space which includes also the intermedi- 
ate configurations generated during the loop construction 
(which do not contribute to the partition function). 

The detailed balance proof is illustrated by an exam- 
ple for a configuration with three vertices in Fig. |^(a). In 
1), the leg with the arrow has been selected as the initial 
entrance point of the operator-loop. An exit leg is chosen 
according to the probabilities discussed above. Flipping 
both the entrance spin and the exit spin leads to a new 
configuration in the extended space. In Fig. |(a) the 
three resulting configurations which have non-zero prob- 
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FIG. 5: Two ways to look at the extended configuration space 
generated during operator-loop construction. Examples of 
how the configuration shown in (a)-l is modified at the be- 
ginning of an update in the link-discontinuity picture (a) and 
ladder operator picture (b) are shown. In (a), the arrow in 
1 indicates the proposed starting point of the loop. In (b), a 
first step of flipping the two spins at this link has already been 
carried out (generating ladder operators which are indicated 
by vertices with semi-filled bars) , and the arrow indicates the 
entrance point for the following step. In both (a) and (b), 
configurations that can be generated out of 1 are shown in 
2-4. Link-discontinuities are indicated by small horizontal 
lines in (a). In both cases, configuration 2 corresponds to 
the bounce process, which results in immediate return to the 
original configuration. 



ability are shown in 2)-4). The entrance — > exit paths 
are also indicated and the corresponding spins have been 
flipped. The probability of process 3) corresponds to the 
example given in Eq. (pfj|), which when inserting the ac- 
tual spin states becomes 



(U \H b \ IT) 



at m it) + m 1^1 m + m 1^1 tt>' 



(21) 



In 2), the entrance and exit are at the same leg. This 
is a bounce process which closes the loop immediately 
with no change in the configuration. In 3) and 4) 
the vertex has changed and two links have appeared 
which connect legs with different spin states. We call 
these "link-discontinuities" . Only configurations with no 
link-discontinuities contribute to Z. All configurations 
created during the loop construction contain two link- 
discontinuities, until the loop closes (which can be seen as 
the discontinuities annihilating each other). There are no 
weight changes associated with the link-discontinuities — 
the configuration weight is still considered to be given by 
Eq. (|l^). Hence, the only weight change arises from the 
change in the affected vertex when the entrance and exit 
spins are flipped. 
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The way the exit leg is chosen at the start of the 
operator-loop corresponds to a heat-bath algorithm. The 
probabilities of no change (staying in the original sub- 
space) or transfer to a configuration with two link- 
discontinuities are proportional to the respective weights 
in the extended space. Once a configuration with two 
discontinuities has been created (i.e., the first step was 
not a bounce), we do not want to create more disconti- 
nuities (which would take us out of the extended space 
considered here) and therefore the following updates can 
only take place at the discontinuities (the end points of 
the path), i.e., the discontinuities can be moved. Here 
the same heat-bath algorithm as in the first step is used. 
The only difference is that the entrance leg is not chosen 
at random but is given by a link from the previous vertex. 
Hence, the whole process consists of a series of heat-bath 
steps, which satisfy detailed balance and therefore gener- 
ate configurations according to probabilities proportional 
to the weight in the extended space. The subset of config- 
urations with zero link-discontinuities, which contribute 
to Z ', are therefore also generated with the correct distri- 
bution. The process is ergodic because all types of ver- 
tices can be generated, and since the operator path can 
wind around the periodic boundaries and then change 
both the spatial winding number and the total magneti- 
zation. Within a sector of fixed winding number and 
magnetization, local updates which constitute a small 
subset of the operator-loops suffice to ensure ergodicity 

& 

Instead of thinking about the extended configuration 
space in terms of link-discontinuities, one can consider 
the vertices created when one of the spins in the original 
vertices of Fig. [j] is flipped. These new vertices corre- 
spond to the single-spin flipping (ladder) operators 
and Sf. The loop construction can be formulated in 
terms of introducing pairs of these, which are then ran- 
domly propagated until they reach the same vertex and 
annihilate each other. The start of such a process is il- 
lustrated in Fig. ||(b), using the same configuration and 
starting point as in ||(a). The difference with respect 
to the previous discussion is that now there are no link- 
discontinuities. Instead, the spins at both ends of the 
link at the selected entrance leg are flipped simultane- 
ously This introduces two ladder vertices. Here one has 
to assign a value, w;, to the matrix elements of the ladder 
operators (i.e., the new operators are viS^~ and viS~). 
The initial loop segment, an example of which is shown 
as I) in Fig. ||(b), is then generated only with a probabil- 
ity min[l, vf /(W1W2)], where W\ and W2 are the matrix 
elements corresponding to the two vertices that are con- 
sidered for replacement by ladder vertices. If this first 
step is accepted the next step is again to choose an exit 
leg. As before, the propagation of the path is carried out 
according to a heat-bath algorithm, with probabilities 
proportional to the matrix element when the entrance 
and exit spins have been flipped. In the example, paths 
2) and 3) lead to closed loops (back to the space with 
no ladder operators), whereas in 4) the ladder operators 



are moved further away from each other. Note that both 
spins on the link corresponding to the exit leg are flipped 
in every steps, so that no link-discontinuities appear. The 
process continues until the two ladder operators are on 
the same vertex, which then becomes equal to one of the 
original two-spin vertices. This brings the system back 
into the original configuration space. 

The link-discontinuity and ladder operator pictures 
of the loop construction are clearly completely equiv- 
alent, although the probabilities associated with start- 
ing (or closing) the loop are different. In actual simu- 
lations it is typically more convenient to use the link- 
discontinuities view. The ladder operator picture explic- 
itly relates the extended configuration space to that of 
correlation functions involving these operators, but the 
link-discontinuities can be easily related to them as well. 
An extended configuration space analogous to the one 
generated in the operator-loop update was first utilized 
in the context of the worm algorithm for continuous-time 
world-line simulations The issue of measuring off- 
diagonal correlation functions using the SSE operator- 
loops has been considered in Ref. |44j. 

In Sec. Ill we will give a more formal and complete 
proof of detailed balance. We will show that the heat- 
bath algorithm is not the only, and also not the most 
efficient, way to satisfy detailed balance when construct- 
ing the operator-loop. We will introduce the concept 
of a directed loop to form a general framework for loop 
updating schemes, both in SSE and path integral simula- 
tions. In the SSE scheme, the directed loop simply leads 
to different probabilities of choosing among the four exits 
from a vertex, all other aspects of the method remaining 
as has been discussed in this section. Before introducing 
the directed loop concept, we first consider special cases 
in which the bounce process can be excluded. In II E we 
discuss more technical SSE implementation issues, which 
also are common to the heath-bath operator loops and 
the new directed loops. 



D. Excluding back-tracking in special cases 

In the general operator-loop algorithm discussed 
above, the probability of the bounce process is always 
non-zero, because the vertex remains unchanged and has 
a non-zero value (otherwise, it would not appear in the 
configuration in the first place). In some special cases, it 
is possible to modify the algorithm in such a way that the 
bounce is completely excluded. This has very favorable 
effects on the simulation dynamics, since there is then no 
back-tracking and all segments of the loop accomplish 
changes in the configuration. 

The most important of the special cases is the isotropic 
Heisenberg model (A = 1, h = 0) |l8|]. A very similar al- 
gorithm exists for the ferromagnet (J < 0) p3j| . For the 
antiferromagnet, choosing the constant e = in Eqs. ( |l8| ) 
implies that the vertices with all spins up or all spins 
down vanish and the remaining four matrix elements all 
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FIG. 6: All allowed vertices in the deterministic operator-loop 
algorithm in the case of the Heisenberg antiferromagnet (a) 
and ferromagnet (b). Operator-loop segments starting at the 
lower left leg are also shown. 



equal 1/2. As a result, the matrix element product in 
( p7| ) is simply (l/2) n and is not affected by the operator- 
loop update. If the bounce process is excluded, the 
only remaining process is the switch-and-reverse shown in 
Fig. ^(a) and the path is hence deterministic. The actual 
loop structure is only changed by the diagonal update. 
The deterministic loop process is clearly symmetric with 
respect to flipping or flipping back the spins at all vertex 
legs covered by the loop and hence it obeys detailed bal- 
ance. For the ferromagnet, the bounce can be excluded 
if C = —1/4 in (0) [for the isotropic ferromagnet A = — 1 
and there is no minus sign in (0)], and the only remaining 
process is then the switch-and-continue process shown in 
Fig. |(b). 

In the deterministic case, each vertex leg can be 
uniquely assigned to a loop, and the loops can be flipped 
independently of each other. Instead of randomly choos- 
ing starting points and constructing a fixed number of 
loops, one can then construct all possible loops exactly 
once, by always picking a starting point which does not 
belong to a loop already constructed. The loops should 
then be flipped with probability 1/2. The random deci- 
sion of whether or not to flip can be made before the loop 
is constructed, but even if the decision is not to flip one 
has to construct the whole loop and set flags on the ver- 
tex legs visited, so that one does not attempt to construct 
the same loop again. Loops are constructed this way un- 
til all An vertex legs have been visited. This method 
of constructing all the loops is analogous to the classi- 
cal Swendsen-Wang multi-cluster method |4^| , whereas, 
as was already mentioned above, the operator-loop con- 
struction in the general non-deterministic case is more 
similar to the Wolff single-cluster algorithm j45| . 

It should be noted that in the deterministic case an 
algorithm including only operator updates (diagonal up- 
dates and loops) is not completely ergodic. In the an- 
tiferromagnet, spin states with all spins up or down are 
isolated from the other states since no operators can act 
on them. These two states are important only at very 
high temperatures and they can then be reached by also 
performing random flips of "free" spins. In simulations 
with e > all states can be reached even without such 
spin flips. 

Another special case is the XY-model ]18|, (A = 0, 



h = 0). In this case all matrix elements in (18) equal 1/2 



if the constant e = 1/2. The weight is then again only 
dependent on n and does not change in the operator- 
loop update. The bounce can therefore be excluded also 
in this case, leaving two remaining allowed exits from 
each vertex. Although these paths are not deterministic, 
one can still subdivide the system into loops that can be 
flipped independently of each other. 

The loop structure in the general operator-loop al- 
gorithm, which includes bounce processes, is similar to 
that in the worm algorithm for continuous-time path- 
integrals |3| , although the two methods are quite different 
in other respects (the actual processes used to construct 
the SSE operator-loops and the worms are different — see 
Sec. VII). In the special cases where the bounce process 
can be excluded, the SSE operator-loops are analogous to 
the world-line loops (in discrete jl6| or continuous ||, |T^] 
imaginary time). The close relationships between the 
Euclidean path integral in continuous time and the dis- 
crete representation on which the SSE method is based 
has been discussed in previous papers ||, and will 

also be further elucidated here in Sec. IV. 



III. DIRECTED LOOPS 

In the operator-loop update discussed in the previous 
section, detailed balance is satisfied using a heat-bath 
algorithm for propagating the path between connected 
vertices. In this section we will present a more general 
set of equations that have to be satisfied for detailed bal- 
ance to hold in such a process. We will show that these 
equations have an infinite number of solutions, some of 
which can lead to a more efficient sampling than the heat 
bath. We construct a particular solution based on the in- 
tuitive hypothesis (for which we have no rigorous proof) 
that the probability of bounces (back-tracking) should 
be minimized. We show that the bounces can in fact be 
completely excluded in a much wider range of parame- 
ters than at the two isolated points (isotropic XY and 
Heisenberg) discussed in Sec. II D. 

We call the entities involved in the more general 
scheme "directed loops", because the detailed balance 
equations that we construct (the directed loop equations) 
explicitly take into account the fact that the construction 
of the path of vertices is directional, i.e., the probability 
of exiting at a particular leg given the entrance leg is not 
the same as the probability of the reverse process. The 
original operator-loop update with the heat-bath proba- 
bilities [Q discussed in the previous section corresponds 
to a particular solution of the directed loop equations. 
We stress that if another solution is used, the only differ- 
ence in the actual simulation with respect to the original 
scheme is a different set of probabilities for exiting at 
a given vertex leg, given the entrance leg and the four 
spin states. Before we explicitly construct new solutions 
in the context of the XXZ-model we begin by describing 
more generally how the directed loop equations arise. 
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A. Conditions for detailed balance 

Let us first recall that the detailed balance requirement 
reads 



P(s -> s')W(s) = P{s' -> s)W(s'), 



(22) 



where s denotes a configuration having weight W(s), 
which in the SSE method is expressed as a product over 
vertex weights, Eq. (|l7|), and P(s — > s') is the proba- 
bility of changing the configuration from s to s'. While 
the weights are given by the Hamiltonian the probability 
for how to update the spin configuration depends on the 
actual algorithm used. 

The algorithm for constructing an operator-loop to 
update an SSE configuration is quite general for any 
form of the 2-body interaction (and can be extended also 
to multi-particle interactions). With the configuration 
mapped onto a linked vertex list, an initial entrance ver- 
tex leg is first picked at random among all An legs. Then 
an exit leg belonging to the same vertex is chosen in a 
probabilistic way and the spins on the entrance and exit 
legs are flipped with unit probability. More generally, 
the states at these legs are updated with non-zero prob- 
abilities only for changes leading to vertices correspond- 
ing to non-zero matrix elements. For simplicity, we here 
assume that the change at the exit leg is uniquely dic- 
tated (through conservation laws) by the change at the 
entrance leg. The process continues using as the new en- 
trance leg the leg linked to the exit leg. The process stops 
when the initial starting leg is reached. The probability 
for arriving at a new configuration s' can therefore be 
written as 



P(s 



^2 P ( e o) P ( s ' e ° ~" s i> e i) 
xP(si,e 1 -> s 2 ,e 2 ) ■ • • 
xP(s„-i,e n _i — > s',e ), 



(23) 



where P(eo) is the probability for choosing the vertex leg 
eo as the initial starting point and P(sj, — > Sj+i, e^+i) 
is the probability given a spin configuration Si and the 
entrance leg to exit the vertex at Xi which is connected 
to the next entrance leg ej+i, resulting in a new config- 
uration Sj+i. The intermediate configurations Sj belong 
to the extended space of configurations with two link- 
discontinuities, as discussed in Sec. II-C. The exit legs 
Xi do not explicitly appear in the probabilities since we 
have assumed that they are uniquely linked to the fol- 
lowing entrance legs e^+i (generalization to cases where 
the uniqueness does not hold are straight-forward). The 
sum is over all possible closed loops which result in the 
updated configuration being the particular configuration 
s' . To find a convenient way of choosing the probabilities 
on the right hand side of the above one needs an expres- 
sion for the inverse process where the spin configuration 
s' is transferred into s. This can be written down quite 
simply by realizing that for each of the terms in Eq. ( [23] ) 
there is a corresponding term which describes the "time"- 



reversed path, which contributes to the reverse probabil- 
ity. Thus one can write 

P{s'^s) = ^P(e )P(s / ,e ^s n _i,e n _ 1 ) 

x ••• x P(s 2 ,e 2 -> si,ei) (24) 
xP(si,ei -> s,e ), 

where the sum is over the same closed loops as in 
Eq. (p3|). By inserting these expressions into the detailed 
balance equation (22) we see that detailed balance is sat- 
isfied if 



W(si)P(si,ei -> s i+ x,e i+ i) = 
W(s i+ i)P(s i+ i,e i+ i -> 8i,ei) 



(25) 



for all possible SSE configurations and entrance legs. Be- 
cause the update (si,ej — * Sj+i, ej+i) changes only one 
particular vertex, all except one of the factors in the prod- 
uct of vertex weights in Eq. (|l7]) factor out and cancel. 
Writing W(s, e, x) = W s P(s,e — > s',x), where we have 
slightly changed the notation so that W s denotes the ma- 
trix element corresponding to a single vertex with its four 
leg states coded as s, e is the entrance leg, and x is the 
exit leg on the same vertex, one can formulate the de- 
tailed balance criterion Eq. (p5|) as 



(26) 



W{s,h,l 2 ) = W{s' ) l 2 ,h) 



which should be valid for all possible vertex types which 
can be converted into each other by changing the states at 
the entrance and exit legs. This equation implies many 
relations between the unknown probabilities of how to 
chose an exit leg given a particular vertex and an en- 
trance leg. There are additional relations which must be 
satisfied. Requiring that the path always continues thru 
a vertex translates into into 



;p( s , e 



1. 



(27) 



where the sum is over all legs on the vertex. We have 
emphasized in the notation s x that the resulting spin 
configuration depends on the exit leg. In terms of the 
weights W(s, ii,Z 2 ) this requirement translates into 



W(s, e, x) = W s 



(28) 



which must be valid for all vertices and entrance-legs. 
This set of equations, Eq. ( p8| ) together with the rela- 
tions in Eq. (|26|), form the directed loop equations, the 
foundations of our new approach to construct valid prob- 
ability tables for the operator-loop update. 



B. SSE directed loops for the XXZ-model 

For the XXZ-model there are just three possible exits 
for any given entrance leg as one choice always leads to a 
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FIG. 7: Example of two vertices with directed loop segments 
that transform into each other in the flipping process. 
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FIG. 8: Possible assignments of directed loop segments for 
half of the different combinations of vertices and entrance- 
legs. The other half of the vertex configurations can be ob- 
tained by interchanging up and down spins (solid and open 
circles) while keeping the arrows. The lines with arrows are 
the directed loop segments. The configurations are divided 
into four sets (one in each quadrant). On flipping the spins 
connected by the loop segment and reversing the direction of 
the arrow, only configurations within the same quadrant are 
transformed into each other. 



zero weight state when spins connected by the loop seg- 
ment are flipped (due to violation of the z-magnetization 
conservation of the model). Fig. || illustrates the pos- 
sibilities for placing directed loop segments for different 
vertices. In order for our update process to satisfy de- 
tailed balance we recall that according to Eq. (|2^) we 
must relate vertices in which the two spins connected by 
the loop segment arc flipped and the direction of the loop 
segment is reversed. Such related configurations are il- 
lustrated in Fig. Furthermore Eq. (|2^) relates vertices 
with different exit legs having the same spin configura- 
tion and entrance legs. We then make the key observation 
that all possible vertex configurations can be divided into 
eight subsets that do not transform into each other. Half 
of these subsets are shown in Fig. ^, where only config- 
urations within the same quadrant are transformed into 
each other. These configurations form closed sets under 
the flipping operation. It is therefore sufficient to derive 
the detailed balance conditions, Eq. (^6|), for transitions 
between vertex configurations in the same set indepen- 
dently of other configurations. 

A row in any of the quadrants in Fig. H contains all 



three configurations which can be reached by entering a 
certain vertex from a certain entrance leg. For instance, 
in the upper left quadrant the entrance leg for the first 
row is the lower left one, for the second row the lower 
right one, and for the third row the upper right one. Ac- 
cording to Eq. (p8|), summing the weights of all possible 
configurations that can be reached from a certain in-leg, 
keeping the spin configuration fixed, should equal the ver- 
tex weight alone. Thus taking the upper left quadrant of 
Fig. H, we have for rows 1-3 from the top: 



Wi = h+a + b, 
W 2 = a + b 2 + c, 
W 3 = b + c + b 3 , 



(29) 



where the symbols on the left hand sides are the vertex 
weights Eq. ( |lS| ) in the spin configuration space, i.e., 

w x = (ll \H b \ IT) = (TT \H b \ TT) = 1/2, 

w 2 = (TT \H b \ TT) = (TT \H b \ TT) = A/2 + h b + e, 

w 3 = (TT \H b \ TT) = e, (30) 

Wi = (TT \H b \ TT) =e + 2h b , 

while those on the right are weights in the enlarged con- 
figuration space of spins and directed loop segments. We 
have assigned equal weights to the configurations which 
are related by flipping, in accordance with Eq. (p6|). The 
order of the symbols on the right-hand sides of Eqs. JpjJ) 
follows the order in the upper left quadrant of Fig. Hso 
that, e.g., the weight of the two configurations in Fig. 
is b and the weight of the very middle configuration in 
the upper left quadrant of Fig. ^ is b 2 . We use b with a 
subscript to denote a weight of a configuration where the 
exit equals the entrance (bounce process). 

As mentioned above there are in all eight sets of ver- 
tex configurations which close under the flipping process. 
These sets are in principle independent of each other and 
have their own equation sets. However, one can easily 
convince one-self that because of symmetry reasons there 
are only two different types of sets. One of these sym- 
metries is that of permuting the two spins acted upon 
by H b . This implies that the equations derived for the 
set in the upper left quadrant in Fig. ||, Eqs. (^9|), are 
the same as for the set (not shown) that can be obtained 
from the upper right quadrant by interchanging up- and 
down spins, keeping the orientation of the directed loop 
segments. The other symmetry is that of imaginary time 
inversion, which in the figures corresponds to switching 
the pairs of spins below and above the horizontal bar 
representing the operator H b . This symmetry together 
with the previous one identify the rules for the upper left 
quadrant of Fig. |] with those of the lower right quad- 
rant. Thus, one only has to consider two independent 
sets of equations, Eqs. ( ^9|) and the corresponding equa- 
tions which can be derived from the lower left quadrant 
in Fig. g: 



Wi = b[ 



b\ 
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w 2 
w 4 



a' + b' 2 + c! 
b' + c' + bL 



(31) 



This latter set takes the form of the set (|29|) but with 
primed symbols to denote the weights and W4 instead of 
W 3 . There is a further reduction in the case of zero mag- 
netic field, where the two equation sets become identical. 

Before discussing solutions to these sets of equations 
it should be stressed that the actual probabilities for se- 
lecting the exit leg is given by dividing the weight in 
the extended configuration space by the weight of the 
bare vertex, so that, e.g., the probability for choosing 
the "bounce" process given that the entrance leg is the 
lower left one on a vertex with weight Wi, as shown in 
the very upper left hand corner of Fig. ||, is b\jW\. 

It is clear that there are many solutions with only posi- 
tive weights to the above equation sets as they are under- 
determined; both sets have six unknowns and there are 
three equations with the additional requirement of non- 
negative weights. A particular symmetric solution is the 
one corresponding to the heat-bath probabilities used in 
the original scheme fl8|| , which we will henceforth refer 
to as Solution A. It is given by 



a = WiW 2 /(W 1 + W 2 + W 3 ), 
b = W 1 W 3 /(W 1 + W 2 + W 3 ), 
c = W 2 W 3 /{W 1 + W 2 + W 3 ), 
b t = W?/{W 1 + W 2 + W 3 ). 



(32) 



For the primed weights, W3 is replaced by W4. Clearly 
the probabilities for choosing the exit leg are here propor- 
tional to the weights of the resulting bare vertices which 
are obtained by flipping the two spins on the loop seg- 
ment, an example of which was given in Eq. (|2C|). This 
solution is valid in the full parameter space of the XXZ- 
model. However, it generally assigns a relatively large 
weight to the bounce processes where the exit leg equals 
the entrance leg. These are ineffective in updating the 
configurations. In particular, when the field h — > and 
the anisotropy A — > 1 the bounce probability approaches 
1/2. Although the method still is reasonably efficient (we 
are not aware of any method that has been more success- 
ful for models including external fields), this is bother- 
some since the SSE algorithm exactly at h = can be 
formulated entirely without any bounce processes ||l8f , as 
reviewed in Sec. II D, and is then considerably more effi- 
cient. The fact that the h — scheme has no bounces and 
is completely deterministic, whereas the h — ► method 
has bounce probabilities approaching 1/2, inspires us to 
look for solutions where the bounce probability instead 
vanishes continuously as h — > 0. This will eliminate the 
"algorithmic discontinuity" of the previous approach. 

For the discussion of other solutions to the directed 
loop equations (|29|) and ([3l|) it is convenient to express 



these equations in terms of the bounce weights b\ , 
1 + A h b -61 - b 2 + b 3 
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FIG. 9: "Algorithmic phasediagram" showing regions where 
various bounce weights must be non-zero. The actual values 
of these weights are given in Table I. In the shaded region all 
bounce weights can be set to zero. Other bounce weights are 
listed in Table I. 
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where we have explicitly inserted the expressions for the 
vertex weights, Eq. (|30|). We seek positive solutions to 
these equations. Being under-determined there are many 
solutions, so we will try to find the solutions that yield 
the most effective algorithms. As a general principle for 
finding efficient rules, we will attempt to minimize the 
bounce weights b\ 1 . . . , b' 3 . The solution so obtained will 
be termed Solution B. Inspecting the equations, it is clear 
that there is one region in parameter space where one can 
avoid bounces altogether. This region is shown as the 
shaded region in Fig. |^. From the requirement of non- 
negative vertex weights we already have the restriction 
e > 0. In the shaded region, the requirement of non- 
negative weights also in the enlarged configuration space 
when all the bounce weights are zero imposes an addi- 
tional constraint on e: e > (1 — A)/4 — hb/2. We have 
no rigorous principle of finding the optimal value of e in 
general, but as can be inferred from our simulation tests 
(presented in Sec. V) it is often advantageous to choose 
a small but non-zero value in cases where e m i n = 0. 

For the Heisenberg antiferromagnet at zero magnetic 
field (A = 1 , h = 0) the deterministic algorithm con- 
structed in Ref. |l8| is recovered for the choice e = 0. 
The non-zero weights are then a = a' = 1/2, while the 
non-zero matrix elements are W\ — W 2 — 1/2, which 
correspond to the switch-and-reverse process illustrated 
in Fig. ^(a). This is a deterministic algorithm as the 
only probabilities different from zero are unity. There 
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TABLE I: Non-zero bounce weights and minimum values of 
e for the different parameter regions of Solution B of the di- 
rected loop equations. The Roman numerals correspond to 
those in Fig. §. We have defined A ± = (1 ± A)/2. 
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is a subtlety here as the ratio C/W3 is undetermined for 
e = 0. However, the value of this probability can be left 
undetermined as the vertex with all spins down will not 
be generated as a consequence of the vanishing of the 
weight W3. This is actually more general — whenever a 
probability cannot be defined because of a zero denomi- 
nator it can be left undetermined because the probability 
of reaching such a vertex is zero in the first place. For 
the XY-model (A = 0) at zero magnetic field the choice 
e — 1/4 gives a different set of zero-bounce rules than 
the one proposed in Ref. |l8[ which, however also is a so- 
lution of our equations (but with e = 1/2). It is quite 
remarkable that for the XY-model one can in fact find 
rules with no bounces for all magnetic field strengths up 
to the saturation field. We expect this to be very useful. 

Outside the shaded region in Fig. ^ one or more bounce 
weights must be non-zero. In these regions we will 
again choose the smallest possible values for the bounce 
weights. Table | shows these values for the different re- 
gions in Fig. ^, along with the minimum value of e al- 
lowed. Selecting a value for e, the remaining weights can 
be obtained using Eqs. (33). 

At the boundary between regions in Fig. ^[ one of the 
bounce weights vanishes continuously. In particular this 
means that the rules for the Heisenberg antiferromag- 
net in a magnetic field approaches the rules in zero field 
continuously as hb — > 0. This is to be contrasted to the 



symmetric Solution A, Eqs. (32), where the bounce prob- 
abilities approach 1/2 as h^ — » 0. Hence, the algorithmic 
discontinuity is indeed removed as the special determin- 
istic solution at the isotropic point is recovered automat- 
ically with Solution B (when e = 0). 

In Sec. V, the performance of simulations using Solu- 
tions A and B will be quantified in terms of calculated 
autocorrelation functions. It will be shown that the new 
Solution B can lead to autocorrelation times more than 
an order of magnitude shorter than with Solution A. The 
improvements are most dramatic for weak but non-zero 
fields and weak Ising anisotropies (A > 1). In Sec. IV we 
will describe how the directed loops also can be adapted 
to simulations in the path integral formalism. Below we 
first briefly discuss the form of the detailed balance equa- 
tions for more general Hamiltonians. 



C. General form of the directed loop equations 

The directed loop approach can be easily applied to a 
much wider class of models than the 5 = 1/2 XXZ-model 
discussed in the preceding section. The SSE operator- 
loop update with the heat-bath probabilities jll| has al- 
ready been applied to several different systems, including 
spin systems with S > 1/2 [Q, various boson models 
]3q , [37j, as well as the ID extended Hubbard model 
We here briefly outline the general form of the di- 
rected loop equations and their solutions for a general 
2-body interaction. 

When the operator-loop update is applied to models 
with higher spins, boson or fermion models, it is clear 
that the simple notion of flipping a spin in the S = 1/2 
XXZ-model must be extended to a change in the state at 
a vertex leg where the final state is to one out of several 
possible ones. Consider as an example a spin-1 model 
where a loop can change the state on a leg by one or two 
units of spin. This is simplified when the total S z is con- 
served as then these different changes can be considered 
as two independent loop-updates. This is because chang- 
ing the state on the exit leg by two units of spin when 
the state on the entrance leg is changed by one unit, or 
vice-versa, violates the S z conservation law. Thus, with 
such a conservation law the state change of the exit leg 
is uniquely determined given the state change at the en- 
trance leg. For simplicity we will here consider only those 
cases where this uniqueness holds, although this is by no 
means a necessary condition. 

In order to describe the general form of the directed 
loop equations for this type of general two-site interaction 
it is convenient to change labeling somewhat from that 
used in the previous section. To define this new labeling, 
we start by selecting a reference vertex (which can be 
any of the allowed vertices) and label its weight W\ . We 
then choose an entrance leg and label this leg as leg 1, 
and then number the rest of the legs on this vertex 2, 3 
and 4. Distributing the weight over all possible exit legs 
according to Eq. (E8T) gives 



Wi — an + + a 13 + a 14: , 



(34) 



where we have labeled the weights ay in the extended 
space by their entrance (i) and exit (j) legs. On changing 
the states at both the entrance and exit legs one arrives 
at a new vertex. If the entrance and exit legs are the 
same the vertex stays the same. Now label the weight of 
the vertex reached by exiting at leg i as Wi. Thus if the 
exit was on leg 2 we would label that vertex W% . W2 has 
a similar decomposition as W\, 



W 2 = a 2 i + 0-22 + a 2 3 + a 2 4, 



(35) 



where now the entrance is on leg two on the vertex which 
differs from vertex 1 by having changed the states at leg 1 
and 2. The weight 021 corresponds to the process where 
the path enters at leg 2 and exits at leg 1. The states are 
changed in the opposite way to that when arriving at W2 
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from Wi, and hence the process is undoing the changes 
and we arrive back at W% . From Eq. ( p6|) it follows that 
0-21 = a i2- Now one can ask the question if exiting at leg 
3 or 4 yields the same vertex type when starting from W<i 
as it does starting from W\. The answer to this is yes, 
because starting from Wi one would change the state at 
leg 1 and 3 while starting from Wi one would change the 
states at legs 2 and 3. But W2 differs from Wi only by 
having different states at legs 1 and 2 and thus the state 
at leg 2 is changed twice in opposite directions resulting 
in the same configuration W3. The weights are hence 
uniquely defined by this procedure, and one is guaran- 
teed that the only vertices which are related by the de- 
tailed balance equations are those which can be reached 
by changing the state on the entrance leg together with 
the state on any exit leg of the reference vertex. The 
directed loop equations can therefore be written as 



/ an ai2 ai 3 a 14 

0-12 0,22 «23 a 24 
Ol3 a23 0,33 0,34 
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(36) 



where the matrix on the left hand side is a real sym- 
metric 4x4 matrix with all entries non-negative for a 
useful algorithm. The magnitudes of the diagonal ele- 
ments determine the bounce probabilities. This is the 
general structure of the directed loop equations for 2-site 
interactions. There are in general several such sets of 
equations, which can be generated one-by-one by chang- 
ing the reference vertex and the type of change at the 
entrance leg. The reference vertex should then of course 
be chosen among vertices that have not yet been gener- 
ated starting from another reference vertex, in order not 
to generate the same equation sets several times. Some 
of the different sets are typically identical to each other 
by symmetry, as in the case of the 5=1/2 XXZ model 
where there are eight sets falling into two classes. In that 
case the structure of the equations changes into 3x3 
forms because there are only three allowed exit possi- 
bilities for each entrance leg. To explain this with an 
example in the scheme used here, we can consider the 
vertex with all spins down as the reference vertex. Then 
W2 = as this configuration corresponds to the case 
where the lower legs (1 and 2) are flipped, resulting in a 
vertex with weight zero. This immediately implies that 
all a's (being all non-negative) with an index 2 must be 
zero and so the result is that row 2 and column 2 is taken 
out resulting in a 3 x 3-matrix. In general, there can be a 
large numbers of 4 x 4 equation sets, some or all of which 
reduce into 3x3 and 2x2 sets (e.g., for Hubbard-type 
electron models there are both 2x2 and 3x3 sets, but 
no 4 x 4 sets). 

Let us consider the 3x3 case in greater detail and ask 
when one can do without bounces, as we saw was possible 
in a region in parameter space of the S — 1/2 XXZ- 
model discussed in the previous section. To do this, it is 
convenient to first label the equations so that W3 > W2 > 



Wi- We then set all the diagonal entries (the bounce 
weights) to zero and find the region of different VF's for 
which the equation set has strictly positive solutions. In 
this case the solution is unique as there are 3 equations 
and 3 unknowns and it is easy to see that the a's are 
positive only when W3 < Wi + W2, and hence one find 
a directed loop solution without bounces only when this 
condition is satisfied. 

Allowing bounces, it is also easy to see that one can 
always do with only one bounce, the one which bounces 
off of the vertex with the largest weight. If W3 is the 
largest weight one can set an = 022 = and 033 — 
W3 — W\ — W2, which gives = 0, a i3 — W\ and 023 = 
W%. This means that the probability for moving between 
the configurations with smallest weight is zero while that 
of moving from the largest weight configuration to the 
smaller ones is the ratio of the smaller weight to the larger 
weight and unity for the reverse process. The bounce 
probability is unity minus the probability for moving to 
the smaller weight configurations. A similar analysis can 
be carried out for the 4x4 equation sets appearing for 
S > 1/2 models and boson models. 

The equation sets involving larger matrices, as encoun- 
tered when dealing with interactions involving more than 
two sites, can also be studied in a similar manner. It 
should be pointed out that there is nothing that guaran- 
tees a priori that the operator-loop update is ergodic (in 
combination with the diagonal updates), for any solution 
of the directed loop equations. Ergodicity requires that 
all allowed vertices can be generated through a series 
of loop updates, and this is typically the case with 2- 
particle terms (although one could in principle construct 
models where it is not the case). However, simple one- 
dimensional loops such as those discussed here cannot al- 
ways accomplish this alone when the interaction includes 
more than two particles, even in the case of relatively 
simple models. The SSE method has recently been ap- 
plied to an XY-model with a standard 2-spin interaction 
J and a 4-spin term K |Is|j . In that particular case, 
an operator-loop update can be used, and is ergodic, for 
|J| > 0, but for J = another cluster-type update had 
to be carried out. In practice, a combination of the two 
updates had to be used for large K/ J. 



IV. PATH INTEGRAL FORMULATION 

In this section we will discuss how the directed loops 
can be applied to the path integral Monte Carlo method 
(PIM) formulated in imaginary time. Such methods are 
known as world-line methods in discrete Q or continuous 
H H imaginary time. The close relationships between 
the SSE and PIM representations of quantum statistical 
mechanics have been explored in previous works |42| , [i"7| . 
Here we will show that also the directed loop ideas can 
be almost directly translated from SSE into the PIM for- 
malism. 



A. Construction of the path integral 

We start by writing the partition function as 

Z = Tr{e-^} = Tr|[] e - ArH |, (37) 

where At = (3/L and L is a large integer. The Hamil- 
tonian is generally a sum of non-commuting pieces, and 
in order to deal with the exponential it is convenient to 
employ the Suzuki- Trotter trick . This involves divid- 
ing the Hamiltonian into several sets of terms, where all 
terms within a set are commuting while the sets them- 
selves are non-commuting. Because the Hamiltonian is 
multiplied by the small quantity At it is possible to split 
the exponential into a product of exponentials, each hav- 
ing one set in the exponent. The errors arising from this 
approximation vanishes as At — > || ^(J. Consider as 
an example the XXZ chain. Then the Hamiltonian can be 
divided into two sets, one involving the operators which 
act on sites 2n and 2n + 1 while the other set involves 
the operators acting on sites In + 1 and 2n + 2. It is 
then possible to insert complete sets of states, which can 
be chosen to be written in terms of S^-components, be- 
tween all the exponentials and the partition function can 
be written @, |, |, [To) 

L 

z = e n^+ii^ ArH2 i^+i/2)(^+i/ 2 i e - Arffi kt>, 

{<r} t=l 

(38) 

where a is a shorthand for a spin configuration in the S z - 
basis of all sites in the chain. The sum is over all possible 
sets of spin configurations, two complete sets of states for 
each time step t, and the trace implies ol+i = cr%. This 
is called the checkerboard breakup as one can visualize 
it as a checkerboard pattern (see Fig. |l^) where all the 
matrix elements are pictured as shaded plaquettes. This 
breakup is completely general and can also be used for 
higher-dimensional lattices. Because each set Hi and H2 
consists of individually commuting terms it suffices to 
consider the interaction on one shaded plaquette only and 
the matrix elements can easily be written down. Keeping 
only terms to first order in At one finds 

Wi = (Tl \e- ArH \ IT) - (IT \e- A ™\ U) = At/2, 

W 2 = (Tl \e- A ™\ Tl) = (IT \e- ArH \ IT) 

= 1 + (C + A/4)At, (39) 

W 3 = (|| \e- A ™\ ||) = 1 + (C- A/4 -hi) At, 

Wi = (TT |e~ ArH | TT) = 1 + (C- A/4 + /i 6 )At. 

These matrix elements differ from the matrix elements 
( |l8| ) in the SSE method only in that the Hamiltonian 
is multiplied by the factor At and the diagonal ma- 
trix elements also come with the zeroth-order term of 
the exponential. The weight W% comes with a minus 
sign which here is omitted by implicitly performing a 7r- 
rotation about the .S^-axis for spins on one sub-lattice. 



14 




12 3 



FIG. 10: The checkerboard breakup of the space-time for a 
spin chain with four sites with open boundary conditions. Hi 
have terms acting on the links between site and 1 and the 
link between site 2 and 3. H2 acts on the link between site 
1 and 2. The shaded plaquettes show where the Hamiltonian 
acts. 

m m 
m m 

FIG. 11: Loop and spin configurations which should have the 
same weight when allowing the loops to be flipped indepen- 
dently. 



This can be done whenever the lattice is bipartite. One 
can of course also calculate the matrix elements ( |39| ) ex- 
actly, but since we will here take the continuum limit it is 
sufficient to go to linear order in At, where the similarity 
to the SSE expressions are most evident. 

In the ordinary world-line loop algorithm (for a review 
see Ref . E{| ) , two loop segments are assigned to each and 
every shaded plaquette in a stochastic way. The shaded 
plaquettes are corner-sharing so that when all shaded 
plaquettes have been assigned segments one can identify 
closed loops. Given that the probabilistic rules for the 
assignment of loop segments for each shaded plaquette 
follows the analogy of Eq. (^6|) and Eq. (pq), one can 
flip a loop with any probability. In particular one can 
pick a random site and a random imaginary time and 
flip the loop which includes this point with probability 
unity. One can also turn this around and first, before 
any loop is constructed, pick a random point in space- 
time and then construct the loop starting at this point 
and flipping spins with probability unity as the loop is 
being constructed. 

When assigning loop segments to each shaded plaque- 
tte one needs two loop segments for each plaquette in 
order to fill the lattice completely. Then many config- 
urations can be reached, as one should be able to inde- 
pendently flip spins along one or both the loop segments. 
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Thus one gets relatively many constraints of the type 
([2(f). This is illustrated in Fig. O. In fact, in zero field 
there are just as many equations as unknowns, and this 
set has only non-negative solutions in the XY-like case 
— 1 < A < 1. In a magnetic field there is one additional 
equation and the set does not have any solutions. Within 
the standard loop algorithm this is repaired by introduc- 
ing additional processes which "freeze" loops together, 
i.e., if spins on one loop is flipped, spins on any loops 
frozen together with the first one will also be flipped. 
This increases the number of unknowns in the equation 
set, making a solution possible. While we are not aware 
of any systematic studies of the effects of the freezing 
process, it tends to freeze all loops together resulting in 
the trivial spin update where all spins are flipped. It 
is therefore not very effective. However, in the extreme 
Ising limit the freezing is responsible for the fact that 
the loop algorithm becomes equivalent to the Swendsen- 
Wang algorithm, and hence the freezing of loops has some 
merits. 



Another method to make the loop algorithm work in 
a magnetic field is to apply the field in the x-direction, 
thereby changing the matrix elements and introducing a 
minus sign. Using the concept of merons the resulting 
sign problem can be solved pl[ p2| , but the simulation 
algorithm is not very efficient for large systems. If one re- 
laxes the condition that the loops should be flipped with 
probability one and instead chooses weights such that 
the flipping probability is maximized, it is possible to find 
rules that work very well at extreme fields p9| . However, 
this success at extreme fields must be regarded as a lucky 
circumstance and is not generally valid for lower fields. 
Yet another and perhaps the simplest loop method in the 
presence of a magnetic field is to construct the loops as if 
the field was absent and then include a Metropolis deci- 
sion whenever attempting to flip a loop that changes the 
magnetization. This method is, however, very ineffec- 
tive Q (except at extremely weak fields; h/J < 1/((3N) 
p5[) as is to be expected as it does not take into account 
the actual physics of the model which is the competition 
between the magnetic field and the exchange energy. 



None of the above methods for treating external fields 
has proven as useful in practice as the SSE operator-loop 
algorithm |L8). The worm algorithm for path integral 
simulations in continuous imaginary time |3j shares some 
important features with the SSE operator-loops (specifi- 
cally, there is an analogue to the back-tracking) and has 
also been used successfully. However, its autocorrelation 
times appear to be longer (as can be seen in comparing 
our results in Sec. V with those presented in Ref. 40). We 
will discuss differences between the procedures used to 
construct directed loops and worms in Sec. VII. Because 
the directed loops is a further improvement of the SSE 
approach, it is natural to investigate if these concepts can 
also be implemented in the path integral formulation. 



B. Directed loops in the PIM 

To implement the notion of directed loops in the path 
integral formulation we note the similarities of the ver- 
tices in the SSE and the shaded plaquettes in the PIM. 
We can identify a corner of a shaded plaquette with a 
vertex leg in the SSE. Both have a spin attached, and 
each corner (leg) is connected to another corner (leg) on 
another shaded plaquette (vertex). To construct a di- 
rected loop, we first choose a random entrance corner 
at a random shaded plaquette. Then, depending on the 
spin configuration, we choose an exit corner and place a 
directed loop segment between the entrance corner and 
the exit corner. The spins connected by the loop seg- 
ment are flipped with probability one. The spin on the 
exit corner is then the entrance spin of the next shaded 
plaquette and the process continues until the loop closes. 
In contrast to the usual loop algorithm there is no notion 
of freezing loops, but there is the necessary (at least in 
some regimes) process of bouncing where the "loop head" 
backtracks some distance along its path and reverses spin 
flips. 

Because of the relation between the SSE vertices and 
the shaded plaquettes, and the similarity of the matrix 
elements (30) and (p?9|), one can immediately write down 



the the detailed balance equations for the PIM using 
Fig. H and interpreting the vertices as shaded plaquettes. 
As in the SSE, there are eight sets of directed loop equa- 
tions which are reduced to two by symmetries. Substi- 
tuting the plaquette weights and expressing the extended 
configuration weights in terms of the bounce weights we 
get 
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= 1 



K - K, - b' 3 



Non-negative weights are required to avoid sign prob- 
lems. This implies that there are regions where bounces 
must be non-zero. In fact the same "algorithmic phase- 
diagram" as shown in Fig. ^ applies here, with the ex- 
ception that in this case there are no restrictions on C 
(or e = C — A/4 — hb) as it always occurs multiplied 
by At in a combination where there also is the zeroth- 
order term of the exponential. In fact, in the construction 
of the loops in continuous imaginary time, where only 
quantities to order At matters, the value of C drops out 
completely as we will consider ratios where it turns out 
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FIG. 12: Left: Continuous imaginary time construction of a 
loop. This figure can be understood as the limit At — > 
of Fig. [l(], dotted (solid) lines correspond to spin down (up). 
Starting at an arbitrary site and time (indicated by the arrow) 
a probability of "decay" dependent on the spin states of the 
neighbors is calculated, and the loop head is moved to the 
point of decay. Right: A resulting a' decay at a time Td 
where the segment up to the decay has changed orientation 
and a new arrow is placed. 



that C does not occur to order At. Thus in contrast to 
the SSE, there is nothing gained by adjusting C in the 
path integral representation. Whenever in a region of pa- 
rameter space where bounces are needed, one can choose 
them to be the minimum values as summarized in Ta- 
ble Q, with the only modification that the bounce weights 
should be multiplied by At. As in the SSE method the 
actual probability for choosing an exit corner, given an 
entrance corner and a spin configuration on a shaded pla- 
quette, is obtained by dividing one of the weights above 
by the appropriate matrix element from Eqs. (|3S|). 

In the limit At — > this method might seem very 
slow as one needs to make a choice for every plaquette of 
which there are infinitely many in this limit. However, 
one can use the method employed in the continuous time 
implementation of the standard world- line loop algorithm 
j|, which is based on the fact that the c,c' weights are 
of order unity. The c,c' weights describe the process of 
continuing the loop construction in the imaginary time 
direction on the same site. Being of order unity means 
that this will be the dominating process. The other pro- 
cesses are multiplied by At and will therefore occur much 
less frequently. 

To illustrate in detail how a loop is constructed in the 
limit At — > 0, consider as an example the situation shown 
in Fig. [H| This figure shows the full imaginary-time spin 
configurations for four sites. The dotted (solid) lines cor- 
respond to spin down (up). The figure can be understood 
as the limit At — > of Fig. [l0| The loop construction 
consists of moving the loop head. This motion begins 
at a random site and time in a random direction. In 
Fig. ^ the starting point and direction is marked by an 
arrow. From the arrow at time To to the time t\ the 
spin configuration on site 1 and its neighbors and 2 
stay unchanged. At time T\ there is a spin-flip process 
exchanging the spins on sites 1 and 2. This means that 
half of the 2r/Ar, r = t\ — t , shaded plaquettes (the 
factor 2 is from the fact that there are two neighbors) 
between the starting point To and n are of the type W2, 
while the other half is of the type W3. The loop head 



will therefore enter alternately the lower left corner on 
shaded plaquettes having weight W2 and the lower right 
corner on shaded W3 plaquettes. On exiting the shaded 
W2 plaquette, one of the three processes a' , 6' 2 or d can 
happen, while for each of the W3 plaquettes one of the 
processes b,c or 63 can happen. The c and d processes 
are by far the most probable as they are of order unity 
while the others are of order At. Therefore until one of 
the other processes of order At occurs, the loop head will 
just continue its motion in the upward direction on site 
1. The probability for the first occurrence of one of the 
processes of order At within an interval At after time t' 
is given by 



P{t')At = 
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where in the last equality we have taken the limit At 
0, and the quantities a, are finite as At — > 0; 
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(43) 



where the subscript on a indicates which neighbor is con- 
sidered. Recall that by definition W3 = b + c + b' 3 and 
W2 = a + 62 + c. Thus, with a random number generator 
one can generate "decay" times according to the distri- 
bution (041) and take the random decay time generated as 
the point where one of the processes a' , b' 2l b or 03 occurs. 
If the decay time so generated is bigger than n — To the 
loop head can be moved directly all the way to time T\, 
while flipping all the spins on site 1 up to time T\ . There 
it enters a shaded plaquette from the lower left corner. 
This plaquette has weight W\, and the possible choices 
for exit corners are determined by the ratio of the weights 
b[,a' and b' to W\ which all are finite as At — > 0. One 
can hence just use the random number generator to select 
the exit corner. Given that the outcome of this choice is, 
for instance, a' the loop head would move to site 2 while 
flipping spins which changes the shaded plaquette of type 
W\ to be of type W2 ■ The process would then continue in 
the downward direction on site 2. If the decay happens 
before n the loop head moves to the decay point while 
flipping spins and then a choice between the possible de- 
cay types is made. Given that a decay occurs, the choice 
of different types of decays is again independent of At 
as only the ratios matter. As an example, the probabil- 
ity of selecting a' is a' /(a' + bi, + 6 + 63). This type of 
process is illustrated in Fig. [l^ Having made the choice 
the process continues, and the loop closes when the loop 
head reaches the original starting point. 

In practice it is convenient to store the spin-flip events 
in a doubly-linked list for each lattice site so that spin- 
flips can be added and removed efficiently. The main 
computational cost is then to search the site of the loop 
head and its neighbors for spin transitions. 
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In zero magnetic field the directed PIM loop algo- 
rithm proposed here corresponds exactly to the single- 
cluster formulation of the ordinary loop algorithm for 
-1 < A < 1 f| |||. This can be seen by setting all 
bounce weights to zero and C = —A/4, and then compar- 
ing our weights to Eq. (39) in Ref . J49| . In the language 
of the usual loop algorithm, our weight a corresponds 
to horizontal breakups, b to diagonal breakups, and c to 
vertical breakups. The general algorithm with bounces is 
more similar to the worm algorithm || , but the processes 
by which the worm is propagated through space-time are 
different and do not correspond to a solution of our di- 
rected loop equations. This will be further discussed in 
Sec. VII. In Sec. V we will demonstrate that the directed 
loop processes, especially with Solution B (in both SSE 
and PIM implementations) lead to much more efficient 
simulation algorithms. 



operator-loop update |18| (Solution A) and the new solu- 
tion of the directed loop equations discussed in Sec. Ill B 
(Solution B). We also present some results obtained with 
Solution B in continuous-time PIM simulations. 

The physical quantities that we will focus on here are 
the magnetization, 
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the uniform magnetic susceptibility 




(46) 



(47) 



the staggered susceptibility, 



V. AUTOCORRELATIONS 

Autocorrelation functions provide quantitative mea- 
sures of the efficiency of a Monte Carlo sampling scheme 
in generating statistically independent configurations. 
For a quantity Q, the normalized autocorrelation func- 
tion is defined as 



Aq{t) = 



m?) {Q{i)) 2 



(44) 



where i and t are Monte Carlo times, for which we will 
use the unit of 1 MCS (as defined in Sec. II-D in the 
case of SSE, and with an analogous definition for the 
PIM). The brackets indicate the average over the time 
i. Asymptotically, the autocorrelation function decays 
exponentially as ~ e _t / TQ , where the asymptotic auto- 
correlation time tq is given by the slowest mode of the 
simulation (the transition matrix of the Markov chain) to 
which the observable Q couples. For short times, the be- 
havior is typically different for different quantities, even 
if tq is the same. The integrated autocorrelation time is 
defined according to 
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(45) 
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and is the autocorrelation measure of greatest practical 
utility @. 

In this section we will present integrated autocorrela- 
tion times for some important quantities in several re- 
gions of the parameter space of the anisotropic Heisen- 
berg model (Q) . We cannot present a completely exhaus- 
tive study, however, since in addition to the field h and 
the anisotropy A, the autocorrelations also depend on 
temperature T '/ 'J — and the lattice size. In addition, 
in SSE simulations the autocorrelations depend on the 
constant e in the matrix elements ([18]). One of our aims 
here is to find the optimum value if e. We compare simu- 
lations with the original general (non-deterministic) SSE 
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dr(Si(r)Sf(0)), (48) 



and the spin stiffness, 



_ d 2 E{cj)) 



(49) 



where E((f>) is the internal energy per spin in the presence 
of a twist <p in the boundary condition. These quantities 
and their SSE estimators have been discussed in detail 
in Ref. @. 

We note again that the definition of an MCS in the 
generic SSE operator-loop scheme involves some degree 
of arbitrariness, as was discussed in Sec. II D. There is 
also a statistical uncertainty due to the statistical deter- 
mination of the number Ni of operator-loops constructed 
per MCS. In all the SSE simulations discussed here, Ni 
was adjusted during the equilibration of the simulation so 
that on average 2M vertex legs (excluding bounces) were 
visited in each MCS. The maximum expansion power M 
was increased if needed after each equilibration MCS, so 
that M = 1.25 x n max , where n max is the highest power 
n generated so far in the simulation. The statistical un- 
certainties in Ni and n max imply some fluctuations in 
the definition of an MCS. This, in turn, results in fluc- 
tuations in the results for the integrated autocorrelation 
times that can be larger than their statistical errors. Typ- 
ically, these fluctuations are only a few percent, however, 
and are hence not problematic. 

In the PIM simulations, we adjusted Ni so that on av- 
erage the total length (again excluding bounces) of all 
Ni loops in a MCS is equal to (3N\ the space-time vol- 
ume. The definitions of an MCS in SSE and PIM sim- 
ulations are hence similar but not identical. One rea- 
son why it is difficult to construct exactly comparable 
MCS definitions in the SSE and the PIM is that the di- 
agonal single-operator updates carried out separately in 
SSE are in effect accomplished during the loop construc- 
tion in the PIM. Another difference is that there is no 
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FIG. 13: Integrated autocorrelation times vs external field for 
the magnetization of an N — 64 Heisenberg chain at /3 = 16. 
The upper and lower panels show results of simulations using 
Solutions A and B, respectively. Several values of the constant 
e were used, as indicated by the legends in the lower panel. 
The inset shows the magnetization itself. 

adjustable constant e in the PIM. In Ref. j|4| an alter- 
native approach of normalizing the autocorrelation times 
by the actual number of operations performed was used. 
However, also this definition may be ambiguous since it 
depends on the details of the implementation, and there 
are also differences in the actual CPU time consumed de- 
pending on the mix of operations (integer, floating point, 
boolean, etc.). These issues are not of major significance 
in the calculations we present below, but should never- 
theless be kept in mind when comparing autocorrelations 
for the two methods. 

The reminder of this section is organized as follows: In 
A we first discuss SSE simulations of the ID Heisenberg 
model in an external field. In B we consider SSE simula- 
tions of 2D systems in fields and with anisotropies. PIM 
results for both ID and 2D systems are presented in C. 
We have also studied several isotropic systems at crit- 
ical points and extracted the dynamic exponent of the 
simulations. We discuss these results in D. 



A. SSE simulations in ID 

When the constant e = 0, the vertices with all spins 
up or all spins down are excluded from the SSE configu- 
ration space when h — 0, since the corresponding matrix 
elements ( fl8| ) then vanish. When h > 0, the all-up ver- 




FIG. 14: Integrated autocorrelation times vs external field for 
the staggered susceptibility of an N = 64 Heisenberg chain at 
/3 = 16. The inset shows the staggered susceptibility. 



tex is again allowed. With e > all vertices are allowed 
and the propagation of the loop is then more random. 
We here begin by studying how the simulation efficiency 
depends on e in the case of the ID Heisenberg model 
(A = 1) in a field. 

Figs. [1^ and |l4] show the field dependence of the inte- 
grated autocorrelation time of the magnetization and the 
staggered susceptibility in simulations of chains with 64 
sites at inverse temperature (3 — 16. As shown in the in- 
set of Fig. [l3|, at the T = saturation field (h sat / J = 2 in 
ID) the magnetization is about 10% from saturation at 
this temperature. The staggered susceptibility is peaked 
at h = 0, reflecting the fact that the staggered spin-spin 
correlation function for spin components parallel to the 
external field is dominant only in the absence of a field. 
In the case of Solution A simulations, the effect of in- 
creasing e from is an initial small drop in Ti, lt [M] for 
fields h < 0.8 and a small increase at higher fields. There 
is a substantial increase in T m t[Xs] for weak fields. As e is 
further increased there is a small increase in T mt [M] also 
for weak fields. In contrast, with Solution B increasing 
e has favorable effects on both autocorrelation times up 
to the highest e studied here. The effects are very small 
for high fields, however, since there the autocorrelation 
time is already close to its lower bound 0.5 when e = 0. 
For all e-values, the autocorrelation times are consider- 
ably shorter with Solution B than with Solution A. This 
shows that the strategy of decreasing the probability of 
the bounce processes in the operator-loop construction 
is working. The effects are particularly pronounced at 
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FIG. 15: Bounce probabilities in Solution A and B simulations 
of an N = 64 Heisenberg chains at j3 = 16, using different 
values of e. 



and close to h = 0, where the shortest autocorrelation 
times with Solution B are only about 10% of those with 
Solution A. 

In Fig. [l5| we show the probability of bounces in the 
simulations (-Pbouncc is the fraction of bounces, including 
length-0 loops). The behavior reflects that of the au- 
tocorrelation times. With Solution B, flounce decreases 
monotonically with e for all fields, whereas with Solution 
A the behavior is non-monotonic. In Solution B, the van- 
ishing of Pbouncc both in the limits h — > and h — ► h sat / J 
(at T = 0) follows by construction, as discussed in Sec. II. 
With Solution A the bounce rate is large in these limits. 

For weak fields, a small e > has favorable effects on 
the magnetization autocorrelations both with solution A 
and B. In the case of Solution B, both Tj nt [M] and Tint \ Xs) 
continue to decrease also when e sa 1, as seen in Figs. 
and [l4| Nevertheless, it is not practical to use a very 
large e since the average expansion order (n) (and hence 
the operator sequence size M) has a contribution e/3-/V&, 
and there is a similar increase in the number of opera- 
tions needed to carry out one MCS. However, Figs. [l3| 

1/4), gives 



and 
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indicate that even a small value, (e 
a significant improvement of the magnetization autocor- 
relations relative to e = simulations. We find that this 
behavior persists also for larger system sizes and lower 
temperatures. Fig. |l6| shows Tj nt [M] for different sys- 
tem sizes N at inverse temperature (5 — N/4, using both 
e = and 1/4. The advantage of e = 1/4 becomes more 
pronounced with increasing system size. For N = 128 
the maximum Tj n t[JVf] is reduced by about 50% for both 
Solution A and B. The relative advantage of Solution B 
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FIG. 16: Integrated autocorrelation times for the magnetiza- 
tion in simulations of chains of different lengths N at inverse 
temperature j3 = AT/4. The inset shows the magnetization. 



over A is again the most dramatic in the limit h — > 0. In 
both solutions, the autocorrelation time is rather strongly 
peaked, with the peak position for the largest systems 
at slightly higher fields for Solution B. The reason for 
this type of field dependence is not clear and deserves 
further study. It cannot be ruled out that a still more 
efficient directed loop solution could be found at interme- 
diate field strengths (which would imply that minimizing 
the bounce probability does not necessarily lead to the 
most efficient algorithm). 

When the temperature becomes small compared to the 
finite-size gaps in the system, a step structure in the mag- 
netization versus field curve can be clearly resolved, as 
is shown in Fig. |l7](a). These steps are also reflected in 
the autocorrelation time, as shown in Fig. |l7|(b). There 
are sharp maximas in the regions where the magnetiza- 
tion switches between two values. Exactly at T = 0, 
the autocorrelation function (Q) for the magnetization 
is ill-defined, since there are then no fluctuations in M 
on the magnetization plateaus. However, we find that 
the limit T — > is well-behaved in the simulations. 
Exactly at the switching fields, r; n t[Af] appears to di- 
verge, however, showing that tunneling between the two 
equal-probability magnetization sectors becomes rare. 
Fig. |l7](c) shows the average size of the operator-loops. 
There are maxima at the switching fields, with the peak 
heights growing as the temperature is lowered. On the 
plateaus, the loop size does not change much with f3. A 
divergence of the average loop size with (3 at the switching 
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FIG. 17: Magnetization vs field of an TV = 64 chain (a), the 
corresponding integrated autocorrelation time (b), and the 
average size of the operator- loops (c). Solid and open circles 
show results at j3 = 64 and 128, respectively. The simulations 
were carried out with Solution B of the directed loop equation 
with e = 1/4. 



fields can be expected, since in order for the magnetiza- 
tion to change, the loop has to wrap around the system 
in the SSE propagation (or imaginary time) direction, 
which is of length ~ (3. The convergence of the aver- 
age loop size on the plateaus can be understood on the 
same grounds. Apart from the oscillations, there is also a 
significant increase in the loop size as the field increases. 

Distributions of loop sizes at j3 = 128 are shown in 
Fig. [l^ for field strengths corresponding to magnetiza- 
tion plateaus (h/J = and 0.14) and switching fields 
(h/J — 0.07 and 0.21). At h — 0, there are no bounce 
processes and this appears to be reflected as a qualita- 
tively different loop size distribution than for h > 0, with 
no very large loops and a larger probability of sizes in the 
range 2 s — 2 11 . For all fields, there is a quite sharp cross- 
over beyond which the probability becomes very small. 
Problems with loops that do not close Q are there- 
fore absent in this case. We did not have to impose any 
maximum size during the loop construction in any of the 
simulations discussed in this paper. 

In the studies of the ID Heisenberg model in a field 
that we have presented here, the new Solution B is clearly 
better than Solution A, although the difference is very 
large only for h close to (but significant also for h — > 
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FIG. 18: Loops size distribution for N = 64 chains at j3 = 
128 and different field strengths (Solution B simulations with 
e = 1/4). P(m) is the cumulative probability of loop sizes 
between 2 m (0 for m = 0) and 2 m+1 - 1. 



h sa t). Already with solution A the autocorrelation time 
for the magnetization is very short compared to other 
approaches. With the continuous-time worm algorithm 
7i nt [M] is close to 100 even for system sizes as small as 
N = 10 and N = 20 [Eol. 



B. SSE simulations in 2D 

For the 2D XXZ-model (on periodic L x L lattices), 
we have calculated autocorrelation times versus the field 
strength in systems with isotropic couplings (A = 1, < 
h < h sa t = 4J), Ising- anisotropic systems in zero field 
(A > 1, h = Q), and the XY-model in zero and finite 
field (A = 0, h/J = 0,1/2). 

Fig. [n] shows the field dependence of the autocorrela- 
tion time for the magnetization of L = 16 and L = 32 
systems at inverse temperature /3 = 8. With Solution 
A at e = 0, a sharp drop in the autocorrelation time 
can be noted immediately when h becomes non-zero. It 
is not surprising that the algorithm at h = is inef- 
ficient, since the only processes occurring here are the 
switch-and-reverse and the bounce (see Fig. ||). The 
bounce probability is high if it is not excluded "by hand" , 
which would yield the much more efficient deterministic 
loop rules. With the bounce included, the actual closed 
loop is still deterministic but during its construction the 
propagating open end oscillates randomly back and forth 
along the defacto deterministic trajectory until the loop 
closes or is annihilated via back-tracking all the way to 
the starting point. Once h is non-zero, the loops be- 
come manifestly non-deterministic (since an additional 
vertex path becomes allowed) and apparently, as seen in 
Fig. [l9|, even for a very small h the simulation is much 
more efficient. This is in contrast to the ID case (see 
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FIG. 19: Integrated autocorrelation times for the magneti- 
zation in simulations of the 2D Heisenberg model in a mag- 
netic field, using Solution A (circles) and B (squares) and 
constants e = (filled symbols) and 1 /4 (open symbols) . The 
inset shows the magnetization (the differences in M between 
L = 16 and L — 32 are very small at the inverse temperature 
P = 8 used here). 



Fig. |l6|), where Solution A with e = is reasonably ef- 
ficient even for h — and r; n t[M] increases when h is 
turned on. This difference between the ID and 2D sim- 
ulations may be related to the loop sizes (although the 
full explanation probably is more complex and related 
to the different physical properties of the systems, which 
arc reflected in the loop structures). In ID, the loops are 
relatively small, and for a small h a large fraction of the 
constructed loops are then identical to the deterministic 
ones at h = 0. In 2D the loops are much larger, and then 
even a small h can allow most paths to "escape" from 
the h = deterministic loop trajectories so that there 
are not as many propagations back and forth along the 
same path as at h = 0. Using a non-zero e also makes 
the path non-deterministic, and Fig. [l^ shows very fa- 
vorable effect of using e = 1/4 in Solution A at h = 0. 
For higher fields, there are only very minor advantages of 
a non-zero e, which is also in contrast to the ID case. As 
in the ID case, Solution B reduces the autocorrelation 
times very significantly at weak fields, and substantially 
also at higher fields. The differences between e = and 
1/4 in Solution B are small at all fields, however. 

Fig. ^O] shows autocorrelation times for the staggered 
susceptibility of Ising- anisotropic systems in zero field at 
(3 = 8. Solution B performs significantly better than 
Solution A for A < 1.5, but only marginally better at 




1 1.5 2 2.5 3 

A 



FIG. 20: Integrated autocorrelation times for the staggered 
susceptibility in simulations of the 2D anisotropic Heisenberg 
model at j3 = 8. The symbols indicate Solutions A, B, and 
e = 0, 1/4 in the same way as in Fig. O. 
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FIG. 21: Upper panel: Bounce probabilities in simulations 
of a 32 x 32 anisotropic Heisenberg model at j3 = 8. Lower 
panel: The average loop size in the same simulations. 
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higher A. In this system e = implies that the closed 
loops are defacto deterministic for all anisotropies (the 
only allowed e = vertex processes are again the bounce 
and the switch- and- reverse). However, the symmetry of 
flipping and flipping back loops is broken when A > I and 
the defacto fixed structure of the closed loops is not taken 
into account during their construction, neither with Solu- 
tion A nor B (doing this would correspond to neglecting 
the bounces, constructing a deterministic loop and then 
taking A into account in a Metropolis acceptance prob- 
ability for actually flipping the loop, in a way analogous 
to what has been done with the standard world-line loop 
method for weak magnetic fields p5||). Solution B mini- 
mizes the bounce probability and hence leads to more di- 
rected paths and, therefore, closing of the loops in fewer 
steps (and hence a larger number of completed loop in an 
MCS as defined here). Bounce probabilities are shown in 
the upper panel of Fig. When e > the loops become 
manifestly non-deterministic, leading to significantly re- 
duced autocorrelation times. The bounce probabilities 
are also reduced, but for both solutions flounce still be- 
comes large as A is increased. Nevertheless, the autocor- 
relation times continue to decrease. We do not expect 
this to be the case as A — > oo, where the model at fixed 
f3 reduces to the classical Ising antiferromagnet at tem- 
perature T — > 0. In that limit, a classical single-spin flip 
would correspond to flipping spins on all SSE vertices on 
a given site (the number of which scales as /3A), which 
would be a slow process since the bounce probability is 
high. The lower panel of Fig. [H] shows that the average 
loop size becomes very small for large A. The algorithm 
clearly does not reduce to a classical Swendsen-Wang or 
Wolff cluster algorithm as A — > oo (in the classical al- 
gorithms the cluster size — > N as T — > 0). However, at 
higher temperatures the algorithm could easily be supple- 
mented with a cluster update which corresponds exactly 
to the classical one (a multi-spin generalization of the 
flips of "free" spins, where clusters of spins connected to 
each other by operators in Sm can be flipped simultane- 
ously without changing the weight if e = and h = 0). 
As in the standard world- line loop algorithm p9| , it is 
also possible to include loop-freezing in the deterministic 
operator-loop algorithm. 

Note that there is essentially no structure in the Solu- 

1/4 in Fig. 20, in spite 



tion B autocorrelation time for e 
of the fact that the scan over anisotropies should cross 
an Ising-type transition to an ordered state. At A = 3 
the antiferromagnetic order is already at sw 97% of the 
maximum (classical T = 0) value, as can be inferred from 
the insets of Fig. ^ by using Eq. @. 

For the XY-model (A = 0), the directed loop equa- 
tions have a solution without bounces for all fields up 
to the saturation field. We find that the resulting algo- 
rithm is very efficient, with autocorrelation times smaller 
than one for all system sizes and temperatures that we 
have studied. Fig. ^2| shows results for the spin stiffness 
as a function of temperature for zero field as well as at 
h/ J = 0.5. The corresponding autocorrelation times are 




FIG. 22: Spin stiffness of the XF-model at zero external field 
(upper panel) and at h/J = 0.5 (lower panel). The insets 
show the corresponding integrated autocorrelation times. So- 
lution B with e = e m i n (see Table I) was used in all cases. 



peaked around the Kosterlitz-Thouless (KT) transition 
temperature but do not grow with the system size. The 
KT transition in the h = system has been studied to 
high accuracy using a continuous-time world-line loop al- 
gorithm, with the result T KT /J w 0.342 Q. Our h = 
data are in complete agreement with the previous results. 
We find that the data for h = 0.5 shown in Fig. ^2] can be 
collapsed onto the h = data if T and p s are both scaled 
by the same factor (w 1.05 for h/J= 0.5), in accord with 
the universality of the transition. More extensive results 
for this model will be presented elsewhere. 



C. PIM simulations 

Next we will show some results for autocorrelation 
times obtained using the PIM implementation of the di- 
rected loop algorithm. To make a reasonable compari- 
son with the autocorrelation times for the SSE, we will 
also in the PIM define a MCS so that it includes Ni 
loops, where Ni is determined such that on average the 
total path length, excluding the first path segment im- 
mediately following each bounce, of all Ni loops in an 
MCS is equal to (3N; the space-time volume (in the PIM, 
each path segment has a length in imaginary time, in 
contrast to the SSE where the steps are just counted). 
This definition is chosen so that it corresponds reason- 
ably closely to the definition used in the SSE. However, 
it could be argued that a better definition of the total 
path length would be to add all the path segments but 
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FIG. 23: Upper panel: Integrated autocorrelation times vs 
external field for the magnetization (solid circles) and stag- 
gered susceptibility (open circles) in PIM simulations of an 
N = 64 Heisenberg chain at (3 = 16. Lower panel: Bounce 
probability vs external field in the same simulation. 



instead of excluding the segment immediately following 
a bounce one would subtract the part of the path imme- 
diately following a bounce that overlaps with the path 
segment preceding the bounce (with special care taken 
for consecutive bounces). This would more accurately 
take into account the fraction of spins actually flipped. 
We have here used the first definition of the MCS as it 
corresponds more closely to how we define an MCS in the 
SSE method (where a different treatment of the bounces 
could of course also be implemented — see Sec. II D). 

Generally speaking the computer implementation of 
the PIM is more complex than SSE, as it is always nec- 
essary to keep track of the spin states on neighboring 
sites in the PIM. This is not required in the SSE for- 
mulation, where the vertices contain all the information 
needed. Therefore our computer code for the PIM is not 
as efficient as the SSE code in generating a single MCS, 
and so we will be content in this section to show just a 
few PIM autocorrelation results. As Solution A of the 
directed loop equations was already shown above to be 
much less effective than Solution B, we will in this section 
just show results for Solution B. 

Fig. ^3| shows the integrated autocorrelation times 
Tint [M] and Ti nt [x»] for a 64-site Heisenberg chain (A = 1) 
at inverse temperature /3 — 16 as functions of the mag- 
netic field. Comparing with the SSE results in Figs. [13| 
and [l4|, it is seen that Tint [M] is comparable to the e = 1 
case while ri n t[xs] is more similar to the e = curve, ex- 
cept close to h = where it also behaves more like the 




FIG. 24: Field dependence of the integrated autocorrelation 
times for the magnetization in PIM simulations of chains of 
different lengths N at inverse temperature = N/4. 



e = 1 case. The lower panel of Fig. [2^ shows the field 
dependence of the bounce probability. Pbouncc is here de- 
fined as the number of bounces divided by the total num- 
ber of times the path building changes, either by moving 
to a neighbor site or by back-tracking. This measure is 
not directly comparable to the definition in the SSE case, 
as the moves c and c', where the path continues on the 
same site, are not counted in the denominator of Pbounce 
(they are infinitely many in the PIM). Nevertheless, the 
general behavior of -Pbouncc versus h is the same for the 
two methods. 

In Fig. we have plotted r; n t[M] as a function of 
magnetic field for different chain sizes N. In all cases f3 = 
N/4. As in the SSE case (Fig. |l6|) we see an increase in 
Tint[-M] with system size for small to intermediate fields. 
However, the maximum PIM autocorrelation times are 
about 50% smaller than in the SSE e = 1/4 case. 

We have also carried out simulations of the 2D Heisen- 
berg model using the PIM. In Fig. ^5|we show results for 
Tint[-M] for a 16 x 16 lattice at j3 = 8. Here the behavior 
is almost identical to the SSE results shown in Fig. [ll], 
where there is only a small dependence on e. 

From these examples it can be seen that the PIM gen- 
erally has shorter autocorrelation times than SSE in cases 
where the SSE results show a significant dependence on 
the constant e. In some sense the PIM corresponds to 
the e — > oo limit of SSE, as in this limit the continue- 
straight processes also dominate the loop construction in 
SSE. In cases where the SSE autocorrelations converge 
slowly to their e = oo limit the PIM approach may hence 
be more efficient (since in SSE the computation time for 
one MCS grows linearly with e in this limit). However, 
in assessing a method's efficiency one should also take 
into account the cost of performing a single MCS. This 
of course depends heavily on the actual computer imple- 
mentation of the directed loop algorithm. That is, what 
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FIG. 25: Integrated autocorrelation times (PIM) vs external 
field for the magnetization of an 16 x 16 Heisenberg square 
lattice at /3 = 8. 



kind of data structures are used to represent the spin and 
vertex configurations, what kind of search algorithms are 
used for finding spin states at a given time in the PIM, 
e.t.c. While we do not attempt to compare the PIM and 
SSE in this respect here, it is quite clear that it is often 
easier to find a fast and effective implementation for the 
SSE than for the PIM. We also note that the convergence 
to the e — > oo limit in the SSE is relatively fast in all cases 
we have studied so far. The convergence appears to be 
slowest in ID, but even there the reduction of the au- 
tocorrelation times becomes small beyond e = X, where 
they are similar to the PIM autocorrelations. 



D. Dynamic exponent 

An interesting question is how the autocorrelation time 
diverges with the system size in simulations at a critical 
point. The ID Heisenberg model at h = 0, T = exhibits 
power-law (1/r) decay of the staggered spin-spin corre- 
lation function and is a hence a quantum critical system 
||50t . We have studied the integrated autocorrelation time 
for the staggered spin susceptibility in this model as the 
system size N is increased and the inverse temperature 
/3 = N/4. The staggered susceptibility should couple to 
the slowest mode of the simulation, and its autocorrela- 
tion time is therefore expected to diverge asymptotically 
according to a power law; 



Tint [Xs 



(50) 



where z is the dynamic exponent of the simulation. Note 
that it is here essential that f3 and N are taken to infinity 
at a fixed ratio (as the physical dynamic critical exponent 
relating space and imaginary time is 1). It is interesting 
to compare SSE simulations with Solution B at different 
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FIG. 26: Autocorrelation times for the staggered susceptibil- 
ity of the isotropic Heisenberg chain at /3 = N/4 obtained in 
SSE and PIM simulations with Solution B. The dashed line 
has slope 0.75. 



e-values (we do not consider solution A here since it is 
much less efficient than Solution B). It is also interesting 
to compare the two possible ways of flipping the loops 
when e = 0. At h = 0, e = 0. Solution B reduces to the 
deterministic operator-loop Q. As discussed in Sec. II 
D, instead of constructing a fixed number of loops per 
MCS at random, all loops can then be constructed and 
flipped independently of each other with probability 1/2. 
This is analogous to the Swendsen-Wang |l6] algorithm 
for the classical Ising model. For the Ising model, it is 
known that it is more efficient (i.e., z is smaller) to con- 
struct the clusters one-by-one using the Wolff algorithm 
©■ 

In Fig. ^6] we show results of Solution B simulations 
with e = and 1/4 along with results from e = sim- 
ulations were all clusters were constructed. The auto- 
correlation times of the two e = simulations are very 
similar, but for large systems marginally shorter when all 
clusters are constructed. Hence, there is here no advan- 
tage in constructing the clusters one-by-one. This is most 
likely related to the fact that in order to change the loop 
structure in the SSE simulations at h = 0, e = 0, diago- 
nal updates also have to be carried out. In the scheme 
used here, diagonal updates are only performed at the 
beginning of each MCS, and hence the same loop can be 
constructed several times in one MCS if they are con- 
structed at random. It is then more efficient to construct 
all loops once. In order to achieve an advantage similar to 
the Wolff algorithm, one would have to construct a new 
scheme for the diagonal updates, which certainly could be 
possible but which we have not yet attempted. As in the 
other cases we have discussed above, there is a significant 
improvement when e — 1/4 is used in Solution B. How- 
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FIG. 27: SSE autocorrelation times for the staggered and 
uniform susceptibility of the Heisenberg bilayer close to its 
quantum critical point (J±/J = 2.5225 was used). The in- 
verse temperature f3 — L/J. 



FIG. 28: Autocorrelation times for the staggered susceptibil- 
ity and the spin stiffness of the 3D Heisenberg model close to 
its critical temperature (T/J = 0.944 was used). The lines 
correspond to scaling ~ L 1//4 . 



ever, the dynamic exponent appears to be the same in 
all cases; z w 0.75. In Fig. ^ we also show PIM results. 
It is clear that the autocorrelation times here are sig- 
nificantly shorter but most likely the dynamic exponent 
is the same as in SSE. The shorter PIM autocorrelation 
times are consistent with the ID results shown above in 
Sees. V A and C, and clearly we could also reduce the 
SSE autocorrelations by increasing e further. 

In 2D, a well studied quantum critical system is 
the Heisenberg model on two coupled layers (bilayer), 
with intra-plane coupling J and inter-plane coupling Jj_ 
|f3lf . The T = antiferromagnetic long-range order 
in this model vanishes at a critical inter-plane coupling 
(J±/J) c ~ 2.525 ^8|. Some autocorrelation results for 
both SSE and PIM simulations of this model at J±/J = 
2.524 have been presented recently [Q and indicate that 
the dynamic exponent z w in both methods. Our 
most recent simulations indicate that (J±/J) c ~ 2.5225, 
i.e., slightly lover than the previous estimate |28|. In 
Fig. we show integrated autocorrelation times for sev- 
eral quantities at this coupling, using both e = and 1/4 
in Solution B simulations. In the e = case, all clusters 
were constructed in each MCS. We again note significant 
shorter autocorrelation times in the non-deterministic 
simulation (e = 1/4). However, the deterministic sim- 
ulation is significantly faster. One deterministic MCS at 
e = typically only requires 50 — 60% of the CPU time 
of a generic Solution B MCS at e = 1/4. The net gain in 
simulation efficiency with e > is therefore only marginal 
in this case. All our results are consistent with z = 0, al- 
though with e = the convergence to a size-independent 
behavior is rather slow. We have not carried out PIM 
simulations of this system. 

Next we consider the 3D Heisenberg model, which un- 



dergoes a phase transition to an antiferromagnetic state 
at a non-zero temperature [Q. According to recent SSE 
simulations, using systems with N — L 3 sites and L up 
to 16, the critical temperature T c /J = 0.946 ±0.001 p[. 
These simulations were carried out using only local up- 
dates. With the operator-loop update, much larger sys- 
tems can be studied. We have carried out simulations for 
L up to 48 close to the critical temperature. Based on 
the results, we believe that T c is at the low end of the pre- 
vious estimate, likely very close to 0.944. Fig. |2^ shows 
autocorrelation times for the staggered susceptibility and 
the spin stiffness at T/J = 0.944, obtained using the de- 
terministic SSE algorithm with e = (constructing all 
clusters during each MCS) and Solution B with e = 1/4. 
Here the e = results are initially consistent with a dy- 
namic exponent z ss 0.25, but for the largest sizes there 
seems to be a change in behavior, possibly a convergence 
corresponding to z = 0. The e = 1/4 simulation is fully 
consistent with z = 0. 



VI. LOW-FIELD MAGNETIZATION OF THE 2D 
HEISENBERG MODEL 



As an example of an application made possible with So- 
lution B of the directed loop equations, we here present 
SSE simulation results for the 2D Heisenberg model in 
a weak magnetic field. At very low temperatures, the 
field dependence of the magnetization exhibits a step- 
structure due to the gaps between the lowest-energy 
states with magnetization m z — 0,±1,±2,... ± N/2. 
These gaps can be extracted from the calculated magne- 
tization curve. For the isotropic Heisenberg model, the 
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FIG. 29: Total magnetization vs external field in the 2D 
Heisenberg model with L — 64 at two inverse temperatures. 
The curves were calculated using four fitted energies El{S) 
(the same for both curves). 



gaps are exactly the gaps between the degenerate spin 
multiplets with total spin S = 0, 1, ... in the absence of 
the field. 

In an antiferromagnetically ordered system, such as 
the 2D Heisenberg model, the energies of the 5 > 
multiplets relative to the 5 — ground state should 
correspond to the excitations of a quantum rotor when 
5 <C VN. The over-all energy scale can be related to the 
uniform (transverse) magnetic susceptibility J54|: 



E(S) 



5(5 + 1) 



(51) 



where L 2 = N. The asymptotic validity of this relation 
has been verified using quantum Monte Carlo estimates 
for small 5 and L up to 16 J55|, [56). Recently, a slow 
convergence of the spectrum for 5 ~ L was been pointed 
out for the 2D Heisenberg model with spin-1/2 |p6| . A 
systematic study of El(S) for systems larger than L = 16 
was not possible, however, because of the large statistical 
errors in the energy differences. 

With the directed loop algorithm we can instead ex- 
tract the energy gaps using the field dependence of the 
magnetization. As we have shown in Sec. V, the new 
Solution B shortens the autocorrelation times very sig- 
nificantly for low fields, which is what we need in order to 
accurately extract the energy levels for 5 ranging from 
to ~ L. We will present our complete results of such cal- 
culations elsewhere. Here we will demonstrate the power 
of the new method by focusing on the first few levels for 
system sizes L up to 64, i.e., the number of spins is 16 
times larger than in the previous studies |55], [56j . 

In order to see the step-structure needed to extract the 
energy levels El(S) for small 5, the temperature has to 



be below the 5=1 gap, which according to Eq. fl5l] ) and 
previous estimates of the susceptibility (x± ~ 0.065/ J) is 
approximately 0.004/ J for L = 64. In practice, we have 
used inverse temperatures f3 corresponding to roughly 
1/10 of the gap. We have fitted the numerical results 
to a magnetization curve (m z ) calculated using energy 
levels of the form 



E L (S,m z )=E L (S)-hm z 



0,±1,...±5, (52) 



at the same temperature as in the simulations. We adjust 
the energies El(S) to give the best match between the 
calculated and theoretical magnetization curves. Fig. |2^ 
shows results for L = 64 at (3 = 2048 and 4096. We 
used the same fitted levels El{S) at both temperatures 
(clearly, the 5=1 level completely dominates the j3 = 
4096 results, which include only the first magnetization 
step). As in Ref. ^6|, we define a spin- and size-dependent 
susceptibility using the energy levels El(S) obtained in 
this fitting procedure; 
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(53) 



and extrapolate data for fixed 5 to infinite size in or- 
der to determine the thermodynamic susceptibility x±- 
Fig. |3^ shows our results for 5 = 1, 2, 3 and system sizes 
ranging from L = 8 to L = 64. The results up to L = 16 
agree very well with those presented previously |5tj] , but 
our statistical errors are considerably smaller. The col- 
lapse of the three curves onto each other for large sys- 
tems demonstrate the validity of Eq. ( pl| ) for small 5. 
Extrapolating the three data sets to infinite size gives 
the susceptibility x± = 0.0659 ± 0.0002, again in good 



agreement with Ref. 56 but with a considerably reduced 
statistical error. 

For the L = 64 simulations at j3 = 4096, the CPU time 
needed to perform one MCS is approximately 40s on an 
Intel Pentium III running at 866 Mhz. The results shown 
in Fig. H are based on 3 - 8 x 10 4 MCS for each data 
point. 



VII. SUMMARY AND DISCUSSION 

We have introduced the concept of directed loops in 
stochastic series expansion and path integral quantum 
Monte Carlo and implemented them for simulations of 
the 5 = 1/2 XXZ-model in an external magnetic field. 
The directionality of the loop reflects the asymmetry be- 
tween the operation of flipping the spins along the loop 
and the reverse operation of flipping back those spins. 
Such an asymmetry is not present in the standard world- 
line loop algorithm ^, [r], , which as a consequence is 
restricted to certain regions of parameter space. Quite 
generally, there is a hierarchy of three classes of directed 
loops. In the most general case the loop can back track 
during construction. In some regions of the parameter 
space the back tracking can be excluded, and in some 
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FIG. 30: Inverse susceptibility extracted using the energies of 
the S = 1, 2, and 3 multiplets. The curves are quadratic fits. 



further restricted regions the loops become symmetric 
(non-directional) and reduce to the type of loops previ- 
ously considered for world-line [Q, [H| [l7| and SSE Q 
simulations. Hence, the directed loop framework consti- 
tutes a natural generalization of the loop-cluster concept 
]l6|| . We have shown that the transitions between the 
different levels of the hierarchy can be made smooth by 
minimizing the probability of back-tracking when solving 
the directed loop equations. We have also demonstrated 
that the algorithm based on this solution works very well 
in the full parameter space of the XXZ-model. 

Our scheme appears to be much more efficient than the 
worm algorithm for continuous-time path integral simu- 
lations fi, which also is applicable in the full parameter 
space but does not exhibit the three-level hierarchy of 
the directed loops (at least not in its current formula- 
tion) . The configuration space involving two moving dis- 
continuities was used first in the worm algorithm for the 
purpose of measuring off-diagonal correlation functions 
(Greeen's functions). It was also the first method that 
was practically useful in the presence of external fields. 
It is, however, not the presence of the discontinuities that 
makes the worm algorithm and SSE operator-loop jl8j al- 
gorithm applicable in the presence of external fields. One 
can also think of the construction of the standard world- 
line loops jl6| |4j| in terms of moving discontinuities, but 
they are more constrained in their motion and therefore 
cannot take external fields into account. Hence, it is 
the rules for moving the discontinuities that determine 
whether or not a loop or worm simulation is efficient. 
The directed loop equations constitute a framework for 
optimizing these rules. Below we will comment on the 
similarities and differences between worms and directed 
loops. 

The operator-loop update previously constructed for 



SSE simulations |18 corresponds to a particular solu- 
tion (A) of the directed loop equations. We have here 
constructed a different solution (B), which minimizes 
the probability of back tracking in the loop construc- 
tion and therefore is more efficient. The new solution B 
completely eliminates back-tracking (bounce processes) 
in the XXZ-model for z-anisotropies — l<A<luptoa 
finite external field h (up to the saturation field for A = 
and only exactly at h = for |A| = 1). In other interest- 
ing parameter regions the bounce probability is typically 
a few percent or less. Our simulation results show that 
the new solution can decrease the autocorrelation times 
by up to an order of magnitude or more in cases where 
Solution A is the least efficient (at weak and intermedi- 
ate magnetic fields and anisotropies). The algorithmic 
discontinuity of the previous approach (which amounted 
to using a very efficient deterministic algorithm at h = 
and the much less efficient generic Solution A for h > 0) 
is hence avoided with Solution B, where the bounce prob- 
abilities and the autocorrelation times smoothly connect 
to those of the deterministic algorithm. However, our re- 
sults also indicate that the deterministic loop construc- 
tion at h — is not always the most efficient. With a 
non-deterministic solution (Solution B with the constant 
e > in the bond operator) the operator paths becomes 
more random, which has a favorable effect on the auto- 
correlations. 

In addition to being more efficient in terms of the auto- 
correlation times measured in units of our defined MCS, 
Solution B is also typically faster as the number of oper- 
ations required to perform one MCS is smaller (because 
of the smaller bounce probability). In terms of ease of 
implementation, Solution A is more straight-forward as 
it is directly given in terms of matrix elements of bond- 
operators. In order to implement Solution B for a new 
Hamiltonian, one first has to investigate the subclasses 
of vertices with their directed loop segments and then 
minimize the bounce probabilities for all non-equivalent 
classes. SSE with Solution A (and other special solu- 
tions for Heisenberg and XY-models) have already been 
used for a number of different lattices and Hamiltoni- 
ans [§[ ^, H §§ ||, H |H |7[ H ||, @, 
but so far we have only investigated Solution B for the 
XXZ-model discussed in this paper. We expect general- 
izations to a wide range of other models to be relatively 
straight-forward. 

In the continuous-time path integral, Solution B of the 
directed loop equations for zero field and |A| < 1 re- 
sults in an algorithm identical to the standard world-line 
loop algorithm @, ||. The generic algorithm, which in- 
cludes a probability of back-tracking as the loop is con- 
structed, has some features in common with the worm 
algorithm Q . The extended configuration space with an 
open world-line segment (the worm) is the same in the 
two methods (and is analogous also in the SSE operator- 
loop construction, although the representation there is 
discrete rather than continuous). However, there are im- 
portant differences in the actual processes used to prop- 
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agate the path (or worm). In the worm algorithm the 
"jump" and "reconnection" procedures involve the cre- 
ation or annihilation of a kink, in which one of the worm 
ends jumps from one site to another and spins are flipped 
on finite equal-length segments of imaginary time at both 
the initial and final sites |J . The location in time of the 
worm end does not change in these processes, but is ac- 
complished in separate updates. In the PIM directed 
loop scheme, the movement in imaginary time and the 
creation (or annihilation) of a kink is combined, and in 
each step spins are flipped on a finite segment of imag- 
inary time at a single site only. This dynamics follows 
naturally from the vertex-representation introduced for 
the SSE operator-loop algorithm Q , where a single spin 
is flipped on a link connecting two vertices and the possi- 
ble sites (the same or a specific neighbor site) and direc- 
tion (forward or backward) for the next step is dictated 
by the four legs of the vertex. Here we have directly 
translated this dynamics into the path integral simula- 
tion by borrowing ides from the continuous-time loop al- 
gorithm |Q. The simulation dynamics is hence different 
from the worm algorithm, and the worm algorithm does 
not correspond to a solution of the directed loop equa- 
tions. Our autocorrelation results show that the directed 
loop scheme is much more efficient than the worms in 
simulations of the Hciscnberg chain in a magnetic field, 
for which our measured autocorrelation times for small 
systems are almost two orders of magnitude smaller than 
those reported for the worm algorithm [[lo). We expect 
the superior performance of the directed loop scheme to 
be quite general, as the bounce minimization achieved 
with Solution B has no counterpart in the worm algo- 
rithm (although it may be possible to develop a general- 
ization). There are, however, very interesting aspects of 
the worm scheme which could also perhaps be incorpo- 
rated for the directed loops, e.g., the space-time potential 
introduced in order to more efficiently measure Green's 
functions at long distances ||. 

Comparing implementations of the directed loops 
within the SSE and PIM representations, one difference 
is that in the former there is an adjustable parameter e 
(a constant added to the bond Hamiltonian operators) 
which is not present in the latter. We have noted that 
a non-zero e has generally favorable effects on the auto- 
correlations in the SSE, but a large value is not practical 
since the computation time also increases with e. In some 
sense, the PIM corresponds to SSE with e — > oo, and one 
might therefore expect the PIM implementation to be 
more efficient. However, in practice the opposite is often 
true since already a small e in the SSE can give autocor- 
relation times close to the e — > oo limit, and the com- 
putation time for one MCS can be significantly shorter 
in SSE. PIM algorithms should be more efficient in cases 
where the diagonal part of the Hamiltonian dominates 
in the internal energy, as the PIM configurations (which 
do not contain diagonal operators) then are smaller than 
the corresponding SSE configurations |i7[ . Another im- 
portant aspect is the ease of implementation and opti- 



mization of the simulations for various models. We have 
found the discrete nature of the SSE configuration space, 
where the vertices locally contain all information needed 
to construct the loops, to be a distinct advantage in this 
respect. 

An interesting question is whether the directed-loop 
approach could be used to further extend the applicabil- 
ity of the meron concept pl| for solving sign problems. 
We have shown that for the XXZ model back-tracking in 
the loop construction can be avoided in a larger region of 
the parameter space than where the loop-algorithms pre- 
viously used for studying merons are applicable (specif- 
ically, at non-zero external fields in XY-anisotropic sys- 
tems) . The possibility of generalizing the meron concept 
to the whole non-back-tracking region should be investi- 
gated. 
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APPENDIX A: PROGRAM IMPLEMENTATION 
OF THE SSE METHOD 

The computer implementation of a simulation method 
can of course be done in several different ways and is an 
issue more technical in nature than the mathematical def- 
inition of the underlying algorithm. Nevertheless, for the 
benefit of readers wishing to quickly construct a simple 
but efficient simulation program, we here briefly outline 
the basic aspects of our implementation of the SSE al- 
gorithm with the operator-loop update. Some programs 
are also available online |57f|. 

We first introduce the main data structures used to 
store the SSE configuration in computer memory. The 
state I a) is stored as Spin[s] — ±1 representing the 
up and down spins at the sites s (s — 1, . . . , N). The 
operator-index sequence Sm can be packed into an array 
Sm[j] (j = 0, . . . , M - 1), with Sm[j] = 26 and Sm\j] = 
26+1 (6 = 1, . . . , Nb) corresponding to diagonal and off- 
diagonal bond-6 operators, respectively, and Sm[j] = 
representing fill-in unit operators. The lattice geometry 
can be coded into a list of sites i(b),j(b) connected by 
the bonds 6, i.e., Site[l,b] = i(b), Site[2,b] = j(b). The 
linked vertices are stored in the form of two lists, one 
containing the links and one the vertex types. The ver- 
tex types = 1, . . . , 6 (p = 0, . . . , n — 1) correspond 
to the six vertices shown in Fig. |l|. The links Link[j] 
(j = 0, . . . , An — 1) are arranged such that Link[Ap + i] 
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(p = 0, . . . , n — 1, i = 0, 1, 2, 3) contains the link (which 
is an integer referring to another element in Link) for leg 
i + 1 of vertex p [the leg numbers 1, 2, 3, 4 are defined in 
Eq. (O)]. The double- linked nature of the list implies 
that if Lmfc[(i] = 6 then = a. 

The diagonal update is straight-forward: For j = 
0, . . . , M — 1, a bond & is generated at random for each 
Sm[j] = 0, which is changed to Sm[j] = 2b with the 
probability ([l4|). If the change is made, the number of 
bond-operators present increases by 1, i.e., n — ► n + 1. 
For each diagonal element, i.e., S'rnfj] > and even, 
the change to S'm^'] = and n — > n — 1 is carried 
out with the probability (p"5|), where 6 = Sm[j]/2. If 
5m[j] is an odd integer it corresponds to an off-diagonal 
operator at bond b = Sm[j]/2 and the correspond- 
ing spin states should propagated, i.e., for a = 1,2, 
Spin[Site[a, b]] — > —Spin\Site\a, b]], in order for the ma- 
trix elements in Eqs. (jlj) and ([15]) to be available as 
needed. 

To understand the implementation of the linked vertex 
list, it is useful to keep in mind Fig. |^ and the number- 
ing of the vertex legs exemplified in Eq. (]l^). In or- 
der to construct the lists Link and Vtx, two temporary 
arrays First[s] and Last[s] (s = 1,...,N) are needed. 
First[s] will contain the first vertex leg on site s, i.e., 
First[s] = Ap + i means that the first operator acting 
on site s is the p:th bond-operator (p = 0, . . . , n — 1) in 
Sm and the vertex leg acting on the site is Z = i + 1 
(where I will always be 1 or 2, as these are the legs 
before the operator has acted). In an analogous way, 
Last[s] = Ap + i refers to the last operator acting on site 
s (where I = i + 1 now will always be 3 or 4, since these 
are the legs after the operator has acted). All elements 
are initialized to First[s] = Last[s] = — 1 before the con- 
struction of the linked list starts. Whereas First[s] will 
be set at most once (never if no operator acts on site 
s), Last[s] can be updated several times as the operator 
list Sm [j] is searched from j — to M — 1. For each 
Sm[j] 7^ 0, a counter p of the number (minus 1) of bond- 
operators encountered is incremented by 1 and the bond 
b = Sm[j]/2 is extracted, giving also the corresponding 
sites so = Site[l,b] and si = Site[2, b]. Links can be set 
whenever these sites have already been encountered, i.e., 
for a = 0, 1, if Last{s a ) ^= —1, Link[Ap + a] = Last[s a ] 
and Link[Last[s a ]\ = Ap + a. The last occurrence is up- 
dated to Last[s a ] — Ap + a + 2. If, on the other hand, 
Last(s a ) — —1, only the last and first occurrences are 
recorded, i.e., Last[s a ] = Ap + a + 2 and First[s a ] = 
Ap + a. The spin list Spin is propagated whenever off- 
diagonal operators are encountered, so that the vertex 
types Wx[p] can be recorded (using a map from four 
leg states to the integers 1, ... ,6). After the whole list 
■Sm has been traversed the list of first occurrences is 
used in order to connect the links across the propaga- 
tion boundary i.e., for each s for which Last[s] ^= — 1, 
Link[Last[s\] — First[s] and Link[First[s]] = Last[s\. 

The loop update is repeated Ni times. Each loop starts 
at a random position jo € {0, . . . , An— 1} in the list Link. 



We will move in Link and the current position will be 
referred to as j. We hence begin at j = jo and keep 
jo in order to check at each stage whether the loop has 
closed or not. The current position corresponds to vertex 
number p = j/A and the leg index is U = MOD(j,A) 
(we can now for convenience number the legs 0, . . . , 3). 
This is the entrance leg, and the vertex type is 
The exit probabilities given the entrance leg depend on 
the vertex type and should be stored in a pre-generated 
table. It is convenient to use a list of cumulative exit 
probabilities instead of the individual probabilities, so 
that for a given entrance leg li the exit leg can be obtained 
by successively comparing the cumulative probabilities 
Prob[l e ,l i: Vtx[p\] for exiting at leg l e = 0, . . . , 3 with a 
random number in the range [0, 1]. A corresponding list 
with updated vertex types is also stored, so that after the 
exit leg has been fixed the vertex is updated as Vtx(p) — > 
NewVtx[l e , Vtx[p]]. After this, the current position in 
Link is changed to the one corresponding to the exit leg, 
i.e., j — > Ap + l e . The loop closes at this stage if j = jo- 
If it does not close, we move to the leg linked to j, i.e., 
j — > Link[j]. The loop closes also at this stage if j = jo- 
The two different types of closings, from within the same 
vertex or from a different vertex, are illustrated in Fig. |^. 

After all the Ni loops have been constructed this way, 
the updated vertex list Vtx is mapped onto the corre- 
sponding new operator list Sm. The bond-indices do 
not change, and therefore one can simply cycle through 
the positions j — 0, . . . , M — 1 in the old list one-by- 
one and for each non-zero occurrence extract the bond 
b = Sm[j]/2 and increment an operator counter p — > p+l 
(the corresponding position in the vertex list Vtx). The 
operator-type, diagonal or off-diagonal, can be coded 
in a list OpType[v] = 0, 1, where v — 1, . . . , 6 is the 
vertex type and 0, 1 correspond to diagonal and off- 
diagonal, respectively. The updated operator element is 
then Sm[j] — 2b + OpType[Vtx[p]]. The spin list Spin 
is updated using the list of first occurrences that was 
generated during the construction of the linked list. For 
each site s, if First[s] = -1 no operator acts on that site 
and the spin can be flipped, Spin[s] — > —Spin[s], with 
probability 1/2. Otherwise, the updated spin state is ob- 
tained by extracting the vertex number p = First[s]/ A 
and the leg I = MOD(First[s],A) corresponding to 
the site in question. The corresponding spin state can 
be stored as a pre-generated map, so that Spin[s] — > 
LegSpin[l, Vtx[p]]. 

We have now described all the basic procedures in- 
volved in carrying out one MCS using the general 
operator-loop update. In the special "deterministic" 
cases, where the exit leg is given uniquely by the en- 
trance leg, a number of rather self-evident and trivial 
simplifications are possible (see discussion in Sec. II-D). 

The possibility of aborting loop updates that become 
excessively long can be simply taken into account by exit- 
ing the loop update routine without mapping the already 
accomplished changes in the vertex list Vtx back into a 
new operator list Sm and state Spin. For the XXZ- 
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model the loops typically do not become excessively long 
in practice however, as was demonstrated in a few exam- 
ples in Sec. V. 

The expansion cut-off M is adjusted during equilibra- 
tion of the simulation by keeping it at a x n max where 
n max is the largest n reached so far in the simulation and 
a suitable value for the factor is a ~ 1.25. The number 



of loops Ni is also adjusted during equilibration, to keep 
the average total number of vertices visited in one MCS 
close to some reasonable number, e.g., 2(n), as discussed 
in Sec. II B. We will not discuss the procedures for mea- 
suring operator expectation values here, but published 
forms for several types of estimators j|[ fy} f5| can be 
easily translated into the data structures used above. 
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